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O Abstract 

. . . 

. In this paper, wc consider coding schemes for computationally bounded channels, which can 

^ introduce an arbitrary set of errors as long as (a) the fraction of errors is bounded with high 

H-i probability by a parameter p and (b) the process which adds the errors can be described by 

a sufficiently "simple" circuit. Codes for such channel models arc attractive since, likc^ c;odes 
for standard adversarial errors, they can handle channels whose true behavior is unknown or 
r , ' varying over time. 

For three classes of channels, wc provide explicit, efficiently cncodable/decodablc codes of 
^ optimal rate where only inefficiently decodable codes were previously known. In each case, 

O we provide one encoder /decoder that works for every channel in the class. The encoders are 

randomized, and probabilities are taken over the (local, unknown to the decoder) coins of the 
encoder and those of the channel. 

^ Unique decoding for additive errors. We give the first construction of a poly-time encod- 

able/decodable code for additive (a.k.a. oblivious) channels that achieve the Shannon capacity 
1 — H{p). These are channels which add an arbitrary error vector e e {0, 1}^ of weight at most 
pN to the transmitted word; the vector e can depend on the code but not on the particular 
transmitted word. Such channels capture binary symmetric errors and burst errors as special 
cases. 

List-decoding for online log-space channels. A space-S{N) bounded channel reads and 
modifies the transmitted codeword as a stream, using at most S{N) bits of workspace on trans- 

^ missions of A'' bits. For constant S, this captures many models from the literature, including 

discrete channels with finite memory and arbitrarily varying channels. We give an efficient code 
*^ with optimal rate (arbitrarily close to 1 — H{p)) that recovers a short list containing the correct 

message with high probability for channels limited to logarithmic space. 

List-decoding for poly-time channels. For any constant c we give a similar list-decoding 
result for channels describable by circuits of size at most N", assuming the existence of pseu- 
dorandom generators. We are not aware of any channel models considered in the information 
theory literature, other than purely adversarial channels, which require more than linear-size 
circuits to implement. 
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1 Introduction 



For the binary symmetric channel which flips each transmitted bit independently with probability 
p < 1/2, the optimal rate of reliable transmission is known to be the Shannon capacity 1 — H(p), 
where H{-) is the binary entropy function [34]. Moreover, concatenated codes (Forney [12]) and 
polar codes (Arikan [■'<]) can transmit at rates arbitrarily close to this capacity and are efficiently 
decodable. In contrast, for adversarial channels that can corrupt up to a fraction p of symbols in 
an arbitrary manner, the optimal rate is unknown in general, though it is known for all p G (0, 5) 
that the rate has to be much smaller than the Shannon capacity. In particular, for p S [j, ^), 
the achievable rate over an adversarial channel is asymptotically zero, while the Shannon capacity 
1 — H{p) remains positive. Determining the best asymptotic rate for error fraction p (equivalently, 
minimum relative distance 2p) remains an important open question in combinatorial coding theory. 

Codes that tolerate adversarial errors are attractive because they can transmit reliably over 
a large range of channels whose true behavior is unknown or varies over time. In contrast, codes 
tailored to a specific channel model tend to fail when the model changes. For example, concatenated 
codes, which can transmit efficiently and reliably at the Shannon capacity with i.i.d. errors, fail 
miserably in the presence of burst errors that occur in long runs. 

In this paper, we consider several intermediate models of uncertain channels. Specifically, we 
consider computationally bounded channels, which can introduce an arbitrary set of errors as long 
as (a) the total fraction of errors is bounded by p with high probability and (b) the process which 
adds the errors can be described by a sufficiently "simple" circuit. The idea behind these models 
is that natural processes may be mercurial, but are not computationally intensive. These models 
are powerful enough to capture natural settings like i.i.d. and burst errors, but weak enough to 
allow efficient communication arbitrarily close to the Shannon capacity 1 — H{p). The models we 
study, or close variants, have been considered previously — see Section 2 for a discussion of related 
work. The computational perspective we espouse is inspired by the works of Lipton [27] and Micali 
et al. [ : ]. 

For three classes of channels, we provide efficiently encodable and decodable codes of opti- 
mal rate (arbitrarily close to 1 — H{p)) where only inefficiently decodable codes were previously 
known. In each case, we provide one encoder /decoder that works for every channel in the class. In 
particular, our results apply even when the channel's behavior depends on the code. 

Structure of this paper. We first describe the models and our results briefly (Section 1.1), and 
outline our main technical contributions (Section 1.2). In Section 2, we describe related lines of work 
aimed at handling (partly) adversarial errors with rates near Shannon capacity. Our results are 
stated formally in Section 3. Section 4 describes our list-decoding-based constructions of channels 
for additive errors. Section 5 describes our efficient constructions at a high-level. The remainder of 
the paper describes and analyzes the constructions in detail, in order of increasing channel strength: 
additive errors (Section 6), space-bounded errors (Section 7) and time-bounded errors (Section 8). 
The appendices contain extra details on the building blocks in our constructions (A), results for 
the "average" error criterion (B) and our impossibility result (C), respectively. 

1.1 Our results 

The encoders we construct are stochastic (that is, randomized). Probabilities are taken over the 
(local, unknown to the decoder) coins of the encoder and the choices of the channel; messages may 
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be chosen adversarially and known to the channel. Our results do not assume any setup or shared 
randomness between the encoder and decoder. 

Unique decoding for additive channels. We give the first explicit construction of stochastic 
codes with polynomial-time encoding/decoding algorithms that approach the Shannon capacity 
1 — H{p) for additive (a.k.a. oblivious) channels. These are channels which add an arbitrary error 
vector e G {0,1}^ of Hamming weight at most pN to the transmitted codeword (of length N). 
The error vector may depend on the code and the message but, crucially, not on the encoder's 
local random coins. Additive errors capture binary symmetric errors as well as certain models 
of correlated errors, like burst errors. For a deterministic encoder, the additive error model is 
equivalent to the usual adversarial error model. A randomized encoder is thus necessary to achieve 
the Shannon capacity. 

We also provide a novel, simple proof that (inefficient) capacity-achieving codes exist for additive 
channels. We do so by combining linear list-decodable codes with rate approaching 1 — H{p) (known 
to exist, but not known to be efficiently decodable) with a special type of authentication scheme. 
Previous existential proofs relied on complex random coding arguments [G, 24]; see the discussion 
of related work below. 

List decoding for space-bounded channels. The additive errors model postulates that the 
error vector has to be picked obliviously, before seeing the codeword. To model more complex 
processes, we consider a channel that processes the codeword as a stream, deciding as it goes which 
positions to corrupt. The channel's only limitation is a bound S{N) on the amount of work space it 
can use (as a function of the block length N). Roughly, we view the channel as a finite automaton 
with 2'^^^'' states. More precisely, in order to allow nonuniform dependency on the code, we model 
the channel as a width-2'^(^) branching program that outputs one bit for every input bit that it 
reads. Even for constant space S, this model captures a wide range of channels considered in 
coding theory, including additive channels, discrete channels with finite memory, bounded delay 
and arbitrarily varying channels; see the discussion of related work below for definitions. Our 
constructions tolerate the larger class of logarithmic-space channels. As above, we assume that the 
channel introduces at most pN errors with high probability. 

First, we show that reliable unique decoding with positive rate is impossible even if one only 
wants to tolerate an arbitrary memoryless channel, when p > 1/4. The idea is that even a memory- 
less adversary can make the transmitted codeword difficult to distinguish from a different, random 
codeword. The proof relies on the requirement that a single code must work for all channels, since 
the "hard" channel depends on the code. 

Thus, to communicate at a rate close to 1 — H[p) for all p, we consider the relaxation to 
list- decoding: the decoder is allowed to output a small list of messages, one of which is correct. 
List-decodable codes with rate approaching 1 — H(p) are known to exist even for adversarial errors 
[41, 11]. However, constructing efficient (i.e., polynomial-time encodable and decodable) codes for 
list decoding with near-optimal rate is a major open problem. 

Our main contribution for space-bounded channels is a construction of polynomial-time list- 
decodable codes that approach the optimal rate for channels whose space bound is logarithmic in 
the block length N. Specifically, for every message m and log-space channel W, the decoder takes 
W(Enc(m;r)) as input and returns a small list of messages that, with high probability over r and 
the coins of the channel, contains the real message m. The size of the list is polynomial in 1/e, 
where A^(l — H{p) — e) is the length of the transmitted messages. Note that the decoder need 
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not return all words within a distance pN of the received word (as is the case for the standard 
"combinatorial" notion of list decoding), but it must return the correct message as one of the 
candidates with high probability. This notion of list-decoding is natural for stochastic codes in 
communication applications. In fact, it is exactly the notion that is needed in constructions which 
"sieve" the list, such as [14, 28]; see related work in Section 2. 

Our results raise a compelling question: are there stochastic codes of rate approaching 1 — H{p) 
that can be uniquely decoded from pn log-space errors, when p < 1 /4? 

List decoding for polynomial time channels. More generally, one may consider channels 
whose behavior on A^-bit inputs is described by a circuit of size T{N). Logarithmic space chan- 
nels, in particular, can be realized by polynomial-size circuits. In fact, we do not know of any 
channel models considered in the information theory literature, other than purely adversarial chan- 
nels, which require more than linear time to implement. Our construction of list-decodable codes 
for logarithmic-space channels can be extended to handle channels with a given polynomial time 
bound T(n) = N^, for any fixed c > 1, under an additional assumption, namely, the existence 
of pseudorandom generators of constant stretch that output pseudorandom bits and fool cir- 
cuits of size N'^. Such generators exist, for example, if there are functions in E which have no 
subexponential-size circuits [-SO, 21], or if one-way functions exist [40, 19]. 

For all three models, our constructions require the development of new methods for applying 
tools from cryptography and derandomization to coding-theoretic problems. We give a brief dis- 
cussion of these techniques next. A more detailed discussion of the approach behind our code 
construction appears in Section 5. 

1.2 Techniques 

Control/pay load construction. In our constructions, we develop several new techniques. The 
first is a novel "reduction" from the standard coding setting with no setup to the setting of shared 
secret randomness. In models in which errors are distributed evenly, such a reduction is relatively 
simple [1]; however, this reduction fails against adversarial errors. Instead, we show how to hide 
the secret randomness (the control information) inside the main codeword (the payload) in such 
a way that the decoder can learn the control information but (a) the control information remains 
hidden to a bounded channel and (b) its encoding is robust to a certain, weaker class of errors. We 
feel this technique should be useful in other settings of bounded adversarial behavior. 

Our reduction can also be viewed as a novel way of bootstrapping from "small" codes, which 
can be decoded by brute force, to "large" codes, which can be decoded efficiently. The standard 
way to do this is via concatenation; unfortunately, concatenation does not work well even against 
mildly unpredictable models, such as the additive error model. 

Pseudorandomness. Second, our results further develop a surprising connection between coding 
and pseudorandomness. Hiding the "control information" from the channel requires us to make 
different settings of the control information indistinguishable from the channel's point of view. 
Thus, our proofs apply techniques from cryptography together with constructions of pseudorandom 
objects (generators and samplers) from derandomization. Typically, the "tests" that must be fooled 
are compositions of the channel (which we assume has low complexity) with some subroutine of the 
decoder (which we design to have low complexity) . The connection to pseudorandomness appeared 
in a simpler form in the previous work on bounded channels [27, 13, 28]; our use of this connection 
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is significantly more delicate. 



2 Background and Related Previous Work 

There are several lines of work aimed at handling adversarial, or partly adversarial, errors with 
rates near the Shannon capacity. We survey them briefly here and highlight the relationship to our 
results. 

List decoding. List decoding was introduced in the late 1950s [10, 39] and has witnessed a lot of 
recent algorithmic work (cf. the survey [15]). Under list decoding, the decoder outputs a small list 
of messages that must include the correct message. Random coding arguments demonstrate that 
there exist binary codes of rate l — H{p)—£ which can tolerate pN adversarial errors if the decoder is 
allowed to output a list of size 0{l/e) [11, 41, 16]. The explicit construction of binary list-decodable 
codes with rate close to 1 — H{p), however, remains a major open question. We provide such codes 
for the special case of corruptions introduced by space- or time-bounded channels. 

Adding Setup — Shared Randomness. Another relaxation is to allow randomized coding strate- 
gies where the sender and receiver share "secret" randomness, hidden from the channel, which is 
used to pick a particular, deterministic code at random from a family of codes. Such randomized 
strategies were called private codes in [ ]. Using this secret shared randomness, one can transmit 
at rates approaching 1 — H{p) against adversarial errors (for example, by randomly permuting 
the symbols and adding a random offset [27, 23]). Using explicit codes achieving capacity on the 
BSCp [12], one can even get such randomized codes of rate approaching 1 — H{p) explicitly (al- 
though getting an explicit construction with o(n) randomness remains an open problem ['"]). A 
related notion of setup is the public key model of Micali et al. [28], in which the sender generates a 
public key that is known to the receiver and possibly to the channel. This model only makes sense 
for computationally bounded channels, discussed below. 

Our constructions are the first (for all three models) which achieve rate 1 — H{p) with efficient 
decoding and no setup assumptions. 

AVCs: Oblivious, nonuniform errors. A different approach to modeling uncertain channels is 
embodied by the rich literature on arbitrarily varying channels (AVCs), surveyed in [26]. Despite 
being extensively investigated in the information theory literature, AVCs have not received much 
algorithmic attention. 

An Ave is specified by a finite state space S and a family of memoryless channels {Ws : s G S}. 
The channel's behavior is governed by its state, which is allowed to vary arbitrarily. The AVCs 
behavior in a particular execution is specified by a vector s = (si, sat) E 5^: the channel applies 
the operation VF^. to the ith bit of the codeword. A code for the AVC is required to transmit 
reliably with high probability for every sequence s, possibly subject to some state constraint. Thus 
AVCs model uncertainty via the nonuniform choice of the state vector s G . However — and 
this is the one of the key differences that makes the bounded space model more powerful — the 
choice of vector s in an AVC is oblivious to the codeword; that is, the channel cannot look at the 
codeword to decide the state sequence. 

The additive errors channel we consider is captured by the AVC framework. Indeed, consider 
the simple AVC where S = {0, 1} and when in state s, the channel adds s mod 2 to the input bit. 
With the state constraint X^^i Sj ^ pN on the state sequence (si,S2, . . . ,S]\f) of the AVC, this 
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models additive errors, where an arbitrary error vector e with at most p fraction of I's is added to 
the codeword by the channel, but e is chosen obliviously of the codeword. 

Csiszar and Narayan determined the capacity of AVCs with state constraints [7, 8] . In particular, 
for the additive case, they showed that random codes can achieve rate approaching 1 — H{p) 
while correcting any specific error pattern e of weight pN with high probability.^ Note that codes 
providing this guarantee cannot be linear, since the bad error vectors are the same for all codewords 
in a linear code. The decoding rule used in [7] to prove this claim was quite complex, and it was 
simplified to the more natural closest codeword rule in [n]. Langberg [ ] revisited this special 
case (which he called an oblivious channel) and gave another proof of the above claim, based on a 
different random coding argument. 

As outlined above, we provide two results for this model. First, we give a new, more modular 
existential proof. More importantly, we provide the first explicit constructions of codes for this 
model which achieve the optimal rate 1 — H{p). 

Polynomial-time bounded channels. In a different vein, Lipton [27] considered channels whose 
behavior can be described by a polynomial-time algorithm. He showed how a small amount of 
secret shared randomness (the seed for a pseudorandom generator) could be used to communicate 
at the Shannon capacity over any polynomial-time channel that introduces a bounded number of 
errors. Micali et al. gave a similar result in a public key model; however, their result relies on 
efficiently list-decodable codes, which are only known with sub-optimal rate. Both results assume 
the existence of one-way functions and some kind of setup. On the positive side, in both cases the 
channel's time bound need not be known explicitly ahead of time; one gets a trade-off between the 
channel's time and its probability of success. 

Our list decoding result removes the setup assumptions of [27, 28] at the price of imposing a 
specific polynomial bound on the channel's running time and relaxing to list-decoding. However, 
our result also implies stronger unique decoding results in the public- key model [- ]. Specifically, 
our codes can be plugged into the construction of Micali et al. to get unique decoding at rates up 
to the Shannon capacity when the sender has a public key known to the decoder (and possibly to 
the channel). The idea, roughly, is to sign messages before encoding them; see [28] for details. 

Ostrovsky, Pandey and Sahai [31] and Hemenway and Ostrosky [20] considered the construction 
of locally decodable codes in the presence of computationally bounded errors assuming some setup 
(private [ ] and public [2(!] keys, respectively). The techniques used for locally decodable codes 
are quite different from those used in more traditional coding settings; we do not know if the ideas 
from our constructions can be used to remove the setup assumptions from [31, 20]. 

Logarithmic-space channels. Galil et al. [ ] considered a slightly weaker model, logarithmic 
space, that still captures most physically realizable channels. They modeled the channel as a finite 
automaton with polynomially-many states. Using Nisan's generator for log-space machines [29], 
they removed the assumption of one-way functions from Lipton's construction in the shared ran- 
domness model [27]. 

We add nonuniformity to their model to get a common generalization of arbitrarily varying 

^The Ave literature usually discusses the "average error criterion", in which the code is deterministic but the 
message is assumed to be uniformly random and unknown to the channel. We prefer the "stochastic encoding" model, 
in which the message is chosen adversarially, but the encoder has local random coins. The stochastic encoding model 
strictly stronger than the Average error model as long as the decoder recovers the encoder's random coins along with 
message. The arguments of Cziszar and Narayan [7] and Langberg [ ' '] also apply to the stronger model. 
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channels. Our code construction for logarithmic-space channels removes the assumption of shared 
setup in the model of [13], at but achieves only list decoding. This relaxation is necessary for some 
parameter ranges, since unique decoding in this model is impossible when p > 1/4. 

Online Codes. Our logarithmic-space channel can also be seen as a restriction of online, or causal, 
channels, recently studied by Dey, Jaggi and Langberg [9, 25]. These channels make a single-pass 
through the codeword, introducing errors as they go. They are not restricted in either space usage 
or computation time. It is known that codes for online channels cannot achieve the Shannon rate; 
specifically, the achievable asymptotic rate is at most max(l — 4p, 0) [9]. Our impossibility result, 
which shows that the rate of codes for time- or space-bounded channels is asymptotically for p > \, 
can be seen as a partial extension of the online channels results of [9, 25] to computationally-bounded 
channels, though our proof technique is quite different. 

3 Statements of Results 

Recall the notion of stochastic codes: A stochastic binary code of rate R G (0, 1) and block length 
N is given by an encoding function Enc : {0, 1}^^ x {0, 1}'' — )• {0, 1}^ which encodes the RN 
message bits, together with some additional random bits, into an A^-bit codeword. Here, N and b 
are integers, and we assume for simplicity that RN is an integer. 

3.1 Codes for worst-case additive errors 

Existential result via list decoding. We give a novel construction of stochastic codes for additive 
errors by combining linear list-decodable codes with a certain kind of authentication code called 
algebraic manipulation detection (AMD) codes. Such AMD codes can detect additive corruption 
with high probability, and were defined and constructed for cryptographic applications in [5]. The 
linearity of the list-decodable code is therefore crucial to make the combination with AMD codes 
work. The linearity ensures that the spurious messages output by the list-decoder are all additive 
offsets of the true message and depend only on the error vector (and not on m,r). An additional 
feature of our construction is that even when the fraction of errors exceeds p, the decoder outputs a 
decoding failure with high probability (rather than decoding incorrectly) . This feature is important 
when using these codes as a component in our explicit construction, mentioned next. 

The formal result is stated below. Details can be found in Section 4. The notation Qp^^ expresses 
an asymptotic lower bound in which p and e are held constant. 

Theorem 3.1. For every p, < p < 1/2 and every e > 0, there exists a family of stochastic codes of 
rate R ^ 1 — H{p)—e and a deterministic (exponential time) decoder Dec : {0, 1}^ — t- {0, 1}'^^U{_L} 
such that for every m G {0, 1}^^ and every error vector e S {0, 1}^ of Hamming weight at most 
pN, Piv [Dec(Enc(m, r) -|- e) = m] ^ 1 — 2~^P'^^'^\ Moreover, when more than a fraction p of 
errors occur, the decoder is able to detect this and report a decoding failure (-L) with probability at 
least 1 -2-^p>-W. 

Given an explicit family of linear binary codes of rate R that can be efficiently list-decoded from 
fraction p of errors with list-size bounded by a polynomial function in N, one can construct an 
explicit stochastic code of rate R — o(l) with the above guarantee along with an efficient decoder. 

Explicit, efficient codes achieving capacity. Explicit binary list-decodable codes of optimal 
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rate are not known, so one cannot use the above connection to construct explicit stochastic codes of 
rate ~ 1 — H{p) for pN additive errors. Nevertheless, we give an expUcit construction of capacity- 
achieving stochastic codes against worst-case additive errors. The construction is described at a 
high-level in Section 5 and in further detail in Section 6. 

Theorem 3.2. For every p G (0,1/2), every e > 0, and infinitely many N, there is an explicit, 
efficient stochastic code of block length N and rate R ^ 1 — H{p) — e which corrects a p fraction 
of additive errors with probability 1 — o(l). Specifically, there are polynomial time algorithms Enc 
and Dec such that for every message m G {0, 1}^^ and every error vector e of Hamming weight at 
most pN , we have Prr[Dec(Enc(m; r) + e) = m] = 1 — exp(— ^^^^^(A^/ log^ -^))- 

A slight modification of our construction gives codes for the "average error criterion," in which 
the code is deterministic but the message is assumed to be uniformly random and unknown to the 
channel. 

3.2 Codes for online log-space bounded channels 

We generalize the model of Galil et al. [13] to capture both finite automaton-based models as well 
as arbitrarily varying channels. To model channels (as opposed to Boolean functions), we augment 
standard branching programs with the ability to output bits at each step. 

Definition 1 (Space bounded channels). An online space-S channel is a read-once branching 
program of width ^ 2"^ that outputs one bit at each computation step. Specifically, let Q = {0, 1}"^ 
be a set of 2^ states. For input length N, the channel is given by a sequence of N transition 
functions Fi : Q x {0, 1} — t- Q x {0, 1}, for i = 1 to N, along with a start state qo G Q. On input 
X = {xiX2 ■ ■ ■ xn) S {0, 1}^, the channel computes {qi,yi) = Fi{qi-i,Xi) for i = 1 to N. The output 
of the channel, denoted A[x), is y = (yiy2 ■ ■ ■ Vn) S {0, 1}^. 

A randomized online space-S channel is a probability distribution over the space of deterministic 
online space-S channels. For a given input x, such a channel induces a corresponding distribution 
on outputs. A randomized channel A is pA^-bounded with probability 1 — /3 if, for all inputs 
X G {0, 1}^, with probability at least 1 — f3, the channel flips fewer than pN bits of x, that is, 
PrAe^ [weight (j; A{x)) > pN] ^ /3. 

We exhibit a very simple "zero space" channel that rules out achieving any positive rate (i.e., 
the capacity is zero) when p > 1/4. In each position, the channel either leaves the transmitted 
bit alone, sets it to 0, or sets it to 1. The channel works by "pushing" the transmitted codeword 
towards a different valid codeword (selected at random). This simple channel adds at most n/4 
errors in expectation. We can get a channel with a hard bound on the number of errors by allowing 
it logarithmic space. Our impossibility result can be seen as strengthening a result by Dey et al. [9] 
for online channels in the special case where p > 1 /4, though our proof technique, adapted from 
Ahlswede [ ], is quite different. Appendix C contains a detailed proof. 

Theorem 3.3 (Unique decoding is impossible for p > |). For every pair of randomized encod- 
ing/decoding algorithms Enc, Dec that make N uses of the channel and use a message space whose 
size tends to infinity with N, for every < u < ^, there is an online space- [log(A^)] channel W2 
that alters at most + v) bits and causes a uniformly random message to be incorrectly decoded 
with probability at least C ■ v for an absolute constant C. 
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For list-decoding, we provide a positive result, namely, a construction of codes with rate ap- 
proaching 1 — H{p) that efficiently recover a short list containing the correct message when the 
channel uses logarithmic space. For details of the construction and analysis, see Section 7. The 
structure of the code is similar to the uniquely decodable code for additive errors; however, addi- 
tional work is needed to make the codewords appear pseudorandom to the channel, and the analysis 
is more subtle. 

Theorem 3.4. For every p G (0,1/2) and constant e > 0, there is an efficient Monte Carlo 
construction of a stochastic code with encoding/decoding algorithms (Enc, Dec) such that for every 
message m G {0, and every randomized online space-S channel \Ns on N input bits 

that is pN -bounded (where Q{\ogN) ^ S ^ o{N / log N)), with high probability over the choice of 
coins r and the errors introduced by W5, Dec(W5(Enc(m; r))) outputs a list of at most poly(l/e) 
messages that includes the correct message m. 

The probability of incorrect decoding is at most N2^^^''^^^ + 2^^(^^^/'^), and the running time 
of (Enc, Dec) is is polynomial in N and 2^ (and therefore polynomial in N for log-space channels). 

3.3 List-decoding for Time-bounded Channels 

Finally, we prove a similar result for time-bounded channels, assuming the existence of certain 
pseudorandom generators (which in turn follow from standard complexity assumptions) . The model 
here is easy to describe: it suffices that the channel be implementable by a circuit of size A^'^ for 
some c ^ 1. 

Theorem 3.5. Assume either E ^ SIZE(2^''"') for some eq > or the existence of one-way func- 
tions. For all constants e > 0, p € (0, 1/2), and c ^ 1, and for infinitely many integers N , there 
exists a Monte Carlo construction (succeeding with probability 1 — A^~^(^) ) of a stochastic code 
of block length N and rate R ^ 1 — H(p) — e with A^'^^'^) time encoding /list decoding algorithms 
(Enc, Dec) that have the following property: For all messages m G {0, l}'^^, and all pN -bounded 
channels W that are implementable by a size 0{N'^) circuit, Dec(W(Enc(m; r))) outputs a list of at 
most poly(l/e) messages that includes the real message m with probability at least 1 — N~^^^\ 

4 List decoding implies codes for worst-case additive errors 

In this section, we will demonstrate how to use good linear list-decodable codes to get good stochas- 
tic codes. The conversion uses the list-decodable code as a black-box and loses only a negligible 
amount in rate. In particular, by using binary linear codes that achieve list decoding capacity, 
we get stochastic codes which achieve the capacity for additive errors. The linearity of the code is 
crucial for this construction. The other ingredient we need for the construction is an authentication 
code (called an algebraic manipulation detection (AMD) code) that can detect additive corruption 
with high probability [ ]. 

4.1 Some coding terminology 

We begin with the definitions relating to list decoding and stochastic codes for additive errors. 
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Definition 2 (List decodable codes). For a realp, < p < 1, and an integer L ^ 1, a code C C 
is said to be {p,L)-list decodable if for every y G there are at most L codewords of C within 
Hamming distance pn from y. If for every y the list of ^ L codewords within Hamming distance 
pn from y can be found in time polynomial in n, then we say C is efficiently {p,L)-list decodable. 
Note that {p, l)-list decodability is equivalent to the distance of C being greater than 2pn. □ 

An efficiently (p, L)-list decodable code can be used for communication on the ADVp channel 
with the guarantee that the decoder can always find a list of at most L messages that includes the 
correct message. 

Definition 3 (Stochastic codes and their decodability). A stochastic binary code of rate R and 
block length n is given by an encoding function Enc : {0, 1}^" x {0, 1}^ — )• {0, 1}" which encodes the 
Rn message bits together with some additional random bits into an n-bit codeword. 

Such a code is said to be ( efficiently) p-decodable with probability 1—5 if there is a (deterministic 
polynomial time computable) decoding function Dec : {0, l}" — )■ {0, 1}^" u {_L} such that for every 
m G {0, 1}^" and every error vector e G {0, 1}" of Hamming weight at most pn, with probability at 
least 1 — 6 over the choice of a random string uj G {0, 1}'', we have 

Dec(Enc(m, lo) + e) = m . 

Though we do not require it in the definition, our constructions in this section of stochastic 
codes from list-decodable codes will also have the desirable property that when the number of errors 
exceeds pn, with high probability the decoder will output a decoding failure rather than decoding 
incorrectly. 

4.2 Algebraic manipulation detection (AMD) codes 

The following is not the most general definition of AMD codes from [' ], but will suffice for our 
purposes and is the one we will use. 

Definition 4. Let Q = (Gi, G2, G3) be a triple of abelian groups (whose group operations are written 
additively) and 5 > be a real. Let G = Gi x G2 x G3 be the product group (with component-wise 
addition). An (0,6) -algebraic manipulation code, or {Q ,6)-AMD code for short, is given by a map 
/ : Gi X G2 — 7- G3 with the following property: 

For every x G Gi, and all A G G, PTreG2 [-^((^; '"j /(^> ^)) + ^) ^ -L}] ^ ^ > 

where the decoding function D : G — ?■ Gi U {_L} is given by D{{x,r, s)) = x if f{x,r) = s and 
_L otherwise. The tag size of the AMD code is defined as log IG2I + log IG3I — it is the number of 
bits the AMD encoding appends to the source. □ 

Intuitively, the AMD allows one to authenticate x via a signed form {x,r, f{x,r)) so that an 
adversary who manipulates the signed value by adding an offset A cannot cause incorrect decoding 
of some x' 7^ x. The following concrete scheme from [ .] achieves near optimal tag size and we will 
make use of it. 

Theorem 4.1. Let ¥ be a finite field of size q and characteristic p, and d be a positive integer 
such that d + 2 is not divisible by p. Then the function /jj^p : F'^ x F — > F given by /amd(^' ~ 
^d^^ ^.^i ^ (^g^ fm'j.AMD code with tag size 21ogg where G = (F'^,F,F).2 

^Here we mean the additive group of the vector space F'*. 
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4.3 Combining list decodable and AMD codes 

Using a {p, L)-list decodable code C of length n, for any error pattern e of weight at most pn, 
we can recover a list of L messages that includes the correct message m. We would like to use 
the stochastic portion of the encoding to allow us to unambiguously pick out m from this short 
list. The key insight is that if C is a linear code, then the other (less than L) messages in the list 
are all fixed offsets of m that depend only on the error pattern e. So if prior to encoding by the 
list-decodable code C, the messages are themselves encodings as per a good AMD code, and the 
tag portion of the AMD code is good for these fixed L or fewer offsets, then we can uniquely detect 
m from the list using the AMD code. If the tag size of the DMD code is negligible compared to 
the message length, then the overall rate is essentially the same as that of the list-decodable code. 
Since there exist binary linear (p, L)-list-decodable codes of rate approaching 1 — H{p) for large L, 
this gives stochastic codes (in fact, strongly decodable stochastic codes) of rate approaching 1 — H{p) 
for correcting up to a fraction p of worst-case additive errors. 

Theorem 4.2 (Stochastic codes from list decoding and AMD). Let b,d be positive integers with 
d odd and k = b{d + 2). Let C : — >• 6e the encoding function of a binary linear {p,L)-list 
decodable code. Let f^yu) be the function from Theorem 4-1 for the choice F = F2t.. Let C be the 
stochastic binary code with encoding map E : {0, 1}^'' x {0, 1}^ — {0, 1}" given by 

E{m,r) = C{m,r,f'^l^-^{m,r)) . 

Then if ^ j^, the stochastic code C is strongly p-decodable with probability 1 — 6. If C is 
efficiently {p,L)-list decodable, then C is efficiently (and strongly) p-decodable with probability 
1-6. 

Moreover, even when e has weight greater than pn, the decoder detects this and outputs J- (a 
decoding failure) with probability at least 1 — 6. 

Note that the rate of C is times the rate of C . 

Proof. Fix an error vector e G {0, 1}" and a message m G {0, 1}^'^. Suppose we pick a random r 
and transmit E{m,r), so that y = E{m,r) + e was received. 

The decoding function D, on input y, first runs the list decoding algorithm for C to find a list of 
i ^ L messages m'^, . . . , whose encodings are within distance pn of y. It then decomposes m[ as 
{rtii, ri, Si) in the obvious way. The decoder then checks if there is a unique index i G {1, 2, . . . ,i} 
for which /AMD(?n'j, ?^j) = Sj. If so, it outputs {mi,ri), otherwise it outputs _L. 

Let us now analyze the above decoder D. First consider the case when wt(e) ^ pn. In this case 
we want to argue that the decoder correctly outputs (m, r) with probability at least 1 — 6 (over the 
choice of r). Note that in this case one of the m^'s equals (m, r, /amd(™'' '^))' ^^^^ happens for 
i = 1 w.l.o.g. Therefore, the condition /AMD('7ii, ?"i) = si will be met and we only need to worry 
about this happening for some i > 1 also. 

Let Ci = y — C{m[) be the associated error vectors for the messages m'^. Note that ei = e. 
By linearity of C, the ej's only depend on e; indeed if c'^^, . . . , are all the codewords of C within 
distance pn from e, then ej = + e. Let Aj be the pre-image of c^, i.e., c'j = C(Aj). Therefore we 
have m[ = m'l -f Aj where the A^'s only depend on e. By the AMD property, for each i > 1, the 
probability that /amd("^«' '^«) ~ choice of r is at most ^ 6/L. Thus with probability 
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at least 1 — 6, none of the checks fpjjixi^'nii^ri) = Sj for i > 1 succeed, and the decoder thus correctly 
outputs mi = m. 

In the case when wt(e) > pn, the same argument shows that the check /amd('^*' ~ passes 
with probability at most 5/L for each i (including i = 1). So with probability at least 1 — 5 none 
of the checks pass, and the decoder outputs _L. □ 

Plugging into the above theorem the existence of binary linear (p, 0(l/e))-list-decodable codes 
of rate 1 — H(p) — e/2, and picking d = 2[co/e] + 1 for some absolute constant cq, we can conclude 
the following result on existence of stochastic codes achieving capacity for reliable communication 
against additive errors. 

Corollary 4.3. For every p, < p < 1/2 and every e > 0, there exists a family of stochastic codes 
of rate at least 1 — H(p) — e, which are strongly p-decodable with probability at least 1 — 2~'^^''''P^'^ 
where n is the block length and c{e,p) is a constant depending only on e and p. 
Moreover, when more than a fraction p of errors occur, the code is able to detect this and report a 
decoding failure with probability at least 1 — 2"^^^'^'^)". 

Remark 1. For the above construction, if the decoding succeeds, it correctly computes in addition 
to the message m also the randomness r used at the encoder. So the construction also gives deter- 
ministic codes for the "average error criterion" where for every error vector, all but an exponentially 
small fraction of messages are communicated correctly. See Appendix B for a discussion of codes 
for this model and their relation to stochastic codes for additive errors. 



5 Overview of Explicit Constructions 

Codes for Additive Errors. Our result is obtained by combining several ingredients from pseu- 
dorandomness and coding theory. At a high level the idea (introduced by Lipton [27] in the context 
of shared randomness) is that if we permute the symbols of the codewords randomly after the error 
pattern is fixed, then the adversarial error pattern looks random to the decoder. Therefore, an 
explicit code Cbsc that can achieve capacity for the binary symmetric channel (such as Forney's 
concatenated code [12]) can be used to communicate on ADVp after the codeword's symbols are 
randomly permuted. This allows one to achieve capacity against adversarial errors when the en- 
coder and decoder share randomness that is unknown to the adversary causing the errors. But, 
crucially, this requires the decoder to know the random permutation used for encoding. 

Our encoder communicates the random permutation (in encoded form) also as part of the overall 
codeword, without relying on any shared randomness, public key, or other "extra" information. The 
decoder must be able to figure out the permutation correctly, based solely on a noisy version of 
the overall codeword (that encodes the permutation plus the actual data). The seed used to pick 
this random permutation (plus some extra random seeds needed for the construction) is encoded 
by a low rate code that can correct several errors (say, a Reed-Solomon code) and this information 
is dispersed into randomly located blocks of the overall codeword (see Figure 1). The locations of 
the control blocks are picked by a "sampler" — the seed for this sampler is also part of the control 
information along with the seed for the random permutation. 

The key challenge is to ensure that the decoder can figure out which blocks encode the control 
information, and which blocks consist of "data" bits from the codeword of Cbsc (the "payload" 
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Final codeword. Control information accounts 
for an e fraction of blocks 



Figure 1: Schematic description of encoder from Algorithm 1. 

codeword) that encodes the actual message. The control blocks (which comprise a tiny portion of 
the overall codeword) are further encoded by a stochastic code (call it the control code) that can 
correct somewhat more than a fraction p, say a fraction p + £, of errors. These codes can have any 
constant rate — since they encode a small portion of the message their rate is not so important, 
so we can use explicit sub-optimal codes for this purpose. 

Together with the random placement of the encoded control blocks, the control code ensures 
that a reasonable {^{e)) fraction of the control blocks (whose encodings by the control code incur 
fewer than p -\- e errors) will be correctly decoded. Moreover, blocks with too many errors will be 
flagged as erasures with high probability. The fraction of correctly recovered control blocks will be 
large enough that all the control information can be recovered by decoding the Reed-Solomon code 
used to encode the control information into these blocks. This recovers the permutation used to 
scramble the symbols of the concatenated codeword. The decoder can then unscramble the symbols 
in the message blocks and run the standard algorithm for the concatenated code to recover the 
message. 

One pitfall in the above approach is that message blocks could potentially get mistaken for 
corrupted control blocks and get decoded as erroneous control information that leads the whole 
algorithm astray. To prevent this, in addition to scrambling the symbols of the message blocks 
by a (pseudo) random permutation, we also add a pseudorandom offset (which is nearly t-wise 
independent for some t much larger than the length of the blocks). This will ensure that with 
high probability each message block will be very far from every codeword and therefore will not be 
mistaken for a control block. 

An important issue we have glossed over is that a uniformly random permutation of the n 
bits of the payload codeword would take n{n\ogn) bits to specify. This would make the control 
information too big compared to the message length; we need it to be a tiny fraction of the 
message length. We therefore use almost t-wise independent permutations for t ~ en/ log n. Such 
permutations can be sampled with ~ en random bits. We then make use of the fact that Cbsc 
enables reliable decoding even when the error locations have such limited independence instead of 
being a uniformly random subset of all possible locations 
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Extending the Construction to log-space and poly-time channels. The construction for 
additive channels does not work against more powerful channels for (at least) two reasons: 

(i) A more powerful channel may inject a large number of correctly formatted control blocks 
into the transmitted word (recall, each of the blocks is quite small). Even if the real control 
blocks are uncorrupted, the decoder will have trouble determining which of the correct-looking 
control blocks is in fact legitimate. 

(ii) Since the channel can decide the errors after seeing parts of the codeword, it may be able to 
learn which blocks of the codeword contain the control information and concentrate errors 
on those blocks. Similarly, wc have to ensure that the channel does not learn about the 
permutation used to scramble the payload codeword and thus cause a bad error pattern that 
cannot be decoded by the standard decoder for the concatenated code. 

The first obstacle is the easier one to get around, and we do so by using list-decoding: although 
the channel may inject spurious possibilities for the control information, the total number of such 
spurious candidates will be bounded. This ensures that after list decoding, provided at least a 
small fraction of the true control blocks do not incur too many errors, the list of candidates will 
include the correct control information with high probability. 

To overcome the second obstacle, we make sure, using appropriate pseudorandom generators 
and employing a "hybrid" argument, that the encoding of the message is indistinguishable from a 
random string by a computationally limited channel (such as one restricted to be online log-space 
or polynomial time bounded), even when the channel has knowledge of the message and certain 
parts of the control information. (For concreteness, let us focus on the online log-space case in the 
following discussion.) This ensures that the distribution of errors caused by the channel on the 
codeword is indistinguishable by online log-space tests from the distribution caused by the channel 
on a uniformly random string. Note that the latter distribution is oblivious to the codeword. If 
these error distributions were in fact statistically close (and not just close w.r.t. online log-space 
tests), successful decoding under oblivious errors would also imply successful decoding under the 
error distribution caused by the online log-space channel. 

The condition that enough control blocks have at most a fraction p + e oi errors can be checked 
in online log-space given non-uniform knowledge of the location of the control blocks. We use this 
together with the above indistinguishability to first prove that enough control blocks are correctly 
list-decoded, and thus the correct control information is among the candidates obtained by list 
decoding the control code. 

The next step is to show that the payload codeword is correctly decoded given knowledge 
of the correct control information. Towards this goal, we argue that certain events that imply 
successful decoding of the concatenated payload codeword, and which we showed to occur with 
high probability against oblivious errors, also happen with good probability against errors caused 
by the online log-space channel. The natural approach towards this is to show that assuming this is 
not the case, one can construct an online log-space distinguisher for the error distributions thereby 
contradicting their computational indistinguishability. Indeed, this can essentially be shown in the 
case of polynomial-time bounded channels. This part is harder for the space-bounded case, since 
Nisan's generator only ensures that the error distribution caused by the channel is indistinguishable 
from oblivious errors by online space-bounded machines. However, the unscrambling of the error 
vector (according to the permutation that was applied to the payload codeword) cannot be done in 
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an online fashion. So we have to resort an indirect argument based on showing hmited independence 
of certain events related to the payload decoding. 

6 Explicit Codes of Optimal Rate for Additive Errors 

This section describes our construction of codes for additive errors. 

6.1 Ingredients 

Our construction uses a number of tools from coding theory and pseudorandomness. These are 
described in detail in Appendix A. Briefly, we use: 

• A constant-rate explicit stochastic code SC : {0, 1}'' x {0, 1}* — )• {0, 1}'^°'', defined on blocks of 
length Cob = G(logiV), that is efficiently decodable with probability 1 — ci/N from a fraction 
p + 0(e) of additive errors, decodable with probability 1 — ci/N. These codes are obtained via 
Theorem 3.1 (see Proposition A.l in the appendix) . 

• A rate 0{e) Reed-Solomon code RS which encodes a message as the evaluation of a polynomial 
at points ai, a£ in such a way that an efficient algorithm RS-Decode can efficiently recover 
the message given at most ei/A correct symbols and at most e/24 incorrect ones. 

• A randomness-efficient sampler Samp : {0, 1}°^ — )• [A^]^ , such that for any subset B C [A^] of 
size at least fiN, the output set of the sampler intersects with B in roughly a /i fraction of its 
size, that is |Samp(s) Ci B\ ^ /i|Samp(s)|, with high probability over s G {0,1}°" . We use an 
expander-based construction from Vadhan [ ]. 

• A generator KNR : {0, 1}'^ — )• Sn for an (almost) t-wise independent family of permutations of the 
set {1, that uses a seed of a = O(tlogn) random bits (Kaplan, Naor, and Reingold [ ]). 

• A generator POLY^ : {0, 1}°^ — )• {0, 1}" for a t-wise independent distribution of bit strings of 
length n, that uses a seed of cj = O(tlogn) random bits. 

• An exphcit efficiently decodable, rate R=l- H{p) - 0(e) code REC : {0, 1}^" {0, 1}" that 
can correct a p fraction of t-wise independent errors, that is: for every message m G {0, 1}^", and 
every error vector e G {0, 1}" of Hamming weight at most pn, we have REC-DECODE(REC(m) + 
-7r(e)) = m with probability at least 1— 2^^^'' over the choice of a permutation vr G/j range(KNR). 
(Here 7r(e) denotes the permuted vector: 7r(e)i = 6,^(1).) A standard family of concatenated codes 
satisfies this property (Smith [35]). 

6.2 Analysis 

The following (Theorem 3.2, restated) is our result on explicit construction of capacity- achieving 
codes for additive errors. 

Theorem 6.1. For every p G (0,1/2), and every e > 0, the functions Encode, Decode (Al- 
gorithms 1 and 2) form an explicit, efficiently encodable and decodable stochastic code with rate 
R = 1 — H(p) — e such that for every m G {0, 1}^^ and error vector e G {0, 1}^ of Hamming 
weight at most pN , we have Pr^ [DECODE(ENCODE(m; w) + e) =m] ^ l-exp(-n(e'^N/log^ N))), 
where N is the block length of the code. 
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With all the ingredients described in Section A in place, we can describe and analyze the code 
of Theorem 6.1. The encoding algorithm is given in Algorithm 1 (page 16). The corresponding 
decoder is given in Algorithm 2 (page 17). Also, a schematic illustration of the encoding is in 
Figure 1. The reader might find it useful to keep in mind the high level description from Section 5 
when reading the formal description. 

Starting the Proof of Theorem 6.1. First, note that the rate R of the overall code approaches 
the Shannon bound: R is almost equal to the rate R' of the code REC used to encode the actual 
message bits m, since the encoded control information has length O^eN). The code REC needs to 
correct a fraction p + 25Ae of t-wise independent errors, so we can pick R' ^ 1 — H{p) — 0{e). Now 
the rate R = ^ = R'{1 - 24Ae) ^ 1 - H{p) - 0{e) (for smah enough e > 0). 

We now turn to the analysis of the decoder. Fix a message m G {0, 1}^'^ and an error vector 
e G {0, 1}-^ with Hamming weight at most pN. Suppose that we run Enc on m and coins cj 
chosen independently of the pair m, e, and let x = Enc(m; lo) + e. The decoder parses x into blocks 
xi, ...,Xn'+i of length AlogA^, corresponding to the blocks output by the encoder. 

The four lemmas below, proved in Section 6.3, show that the decoder recovers the control 
information correctly with high probability. We then show that the payload message is correctly 
recovered. The proof of the theorem is completed in Section 6.4. 

The lemmas illuminate the roles of the main pseudorandom objects in the construction. First, 
the sampler seed is used to ensure that errors are not concentrated on the control blocks, as captured 
in the next lemma: 

Definition 5 (Good sampler seeds). A sampled set T is good for error vector e if the fraction of 
control blocks with relative error rate at most p + e is at least | . □ 

Lemma 6.2 (Good sampler lemma). For any error vector e of relative weight at most p, with 
probability at least 1 — exp(— J7(e^A^/ log A^) over the choice of sampler seed st, the set T is good 
for e. 

Given a good sampler seed, the properties of the stochastic code SC guarantee that many control 
blocks are correctly interpreted. Specifically: 

Lemma 6.3 (Control blocks lemma). For all e,T such that T is good for e, with probability at 
least 1 — exp(— r2(e'^A^/ log A^)) over the random coins (ri,r2, . . . ,r£) used by the £ SC encodings, we 
have: (i) The number of control blocks correctly decoded by SC-Decode is at least ^, and (ii) The 
number o/ erroneously decoded control blocks is less than |j. 

(By erroneously decoded, we mean that SC-Decode outputs neither _L nor the correct message.) 
The offset A is then used to ensure that payload blocks are not mistaken for control blocks: 

Lemma 6.4 (Payload blocks lemma). For all m, e, st, Stt, with probability at least \—2~^^^^^/^°^ 
over the offset seed s^, the number of payload blocks incorrectly accepted as control blocks by 
SC-Decode is less than fj. 

The two previous lemmas imply that the Reed-Solomon decoder will, with high probability, be 
able to recover the control information. Specifically: 

Lemma 6.5 (Control Information Lemma). For any m and e, with probability 1 — 2~^(^^^/l°s^ ^) 
over the choice of the control information and the coins of SC, the control information is correctly 
recovered, that is {st, S/\, s-,^) = {st, s^., 3-,^) . 
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Algorithm 1. Encode: On input parameters N,p,e (with p + e < 1/2), and message m e {0, 1}^ 
where R=l- H{p) - 0(e). 

1: A -S— 2co [> Here Co is the expansion of the stochastic code from Theorem 3.1 that can 

correct a fraction p + e of errors. 
^ ^ A 1^ AT ^ The final codeword consists of n blocks of length AlogW. 

£ <!— 24e7V/ log N t> The control codeword is £ blocks long. 

n' ^ n — £ and N' ^ n' ■ (AlogiV) [> The payload codeword is n' blocks long (i.e. A^' bits). 

Phase 1: Generate control information 

2: Select seeds s^, sa, st uniformly in {0, l}*^ ^ . 

3: uj ^ (s^, Sa, St) > Total length = Se^N. 

Phase 2: Encode control information 

4: Encode to with a Reed-Solomon code RS to get symbols {ai,...,ag). 

> RS is a rate | Reed-Solomon code of length 24eN = * ■ bits which evaluates polynomials at 
points [a-i, .... Qf) in a field F of size ~ N. 

5: Encode each symbol together with its evaluation point: For i ~ 1, ■■■,£, do 

• Ai ^ {ai,ai) 

t> We add location information to each RS symbol to handle insertions and 
deletions. 

• Ci •(— 5C{Ai,ri), where Vi is random of length 21ogA'^ bits. 

> SC = SC2iog]v,p+, : {0, l}-'°s^'x{0, 1}-'°*=^^ ^ {0, is a stochastic 

code that can correct a fraction (p + e) of additive errors with probability 
1 — ci/N^ > 1 — 1/Af as per Proposition A.l. 



Phase 3: Generate the payload codeword 

6: Encode m using a code that corrects random errors: 

• P ^ REC(m), C> REC : {0, 1}^'^' {0, l}'^' is a code that corrects a p + 25Ae fraction of 

t-wise independent errors, as per Proposition A. 7 . Here R' = j^. 

7: Expand the seeds s^jSAjStt to get a set T = Samp(s7'), offset A = POLY(sa), and permutation 

TT = KNR(s^). 

8: Scramble the payload codeword: 

• 7r^^(P) (bits of P permuted according to tt^^) 

• Q^7r-i(P)eA 

• Cut Q into n' blocks Bi, ...Bn' of length Alog bits. 

Phase 4: Interleave blocks of payload codeword and control codeword 

9: Interleave control blocks Ci, Ci with payload blocks Bi, Bn', using control blocks in positions 
from T and payload blocks in remaining positions. 
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Algorithm 2. DECODE: On input x of length A^: 
1: Cut X into n' + i blocks xi, Xn'+i of length Alog(n) each. 
2: Attempt to decode control blocks: For i = 1, ...,n' + £, do 

• SC-DECODE(a;i). 

1> With high prob, non-control blocks are rejected (Lemma 6.4), and control 
blocks are either correctly decoded or discarded (Lemma 6.3). 

• If Fi /_L, then parse Fi as (oj, di), where aj, G F. 

) ^ RS-DECODE(^pairs {ai,di) output above). 

I> Control information is recovered w.h.p. (Lemma. 6.5). 

4: Expand the seeds st, sa, s,r to get set T, offset A, and permutation n. 

b: Q ^ concatenation of blocks Xi not in T 

\> Fraction of errors in Q is at most p + 0{e). 

6: P ^ Tt{Q ® A) [> If control info is correct, then errors in P are almost t-wise independent. 

7: REC-Decode(P) 

[> Run the decoder from Proposition A. 7. 



Remark 2. It would be interesting to achieve an error probability of 2~^^(^), i.e., a positive "error 
exponent," in Theorem 6.1 instead of the 2~^=(^/^°s ^) bound we get. A more careful analysis 
(perhaps one that works with almost i'-wise independent offset A) can probably improve our error 
probability to 2-^-(^/i°g^), but going further using our approach seems difficult. The existential 
result due to Csiszar and Narayan [i] achieves a positive error exponent for all rates less than 
capacity, as does our existence proof using list decoding in Section 4.3. 

Remark 3. A slight modification of our construction give codes for the "average error criterion," 
in which the code is deterministic but the message is assumed to be unknown to the channel and the 
goal is to ensure that for every error vector most messages are correctly decoded; see Theorem B.3 
in Appendix B. 

6.3 Proofs of Lemmas used in Theorem 6.1 

Proof of Lemma 6.2. Let B d [n] = \n' + P\ be the set of blocks that contain a (p + e) or smaller 
fraction of errors. We first prove that B must occupy at least an e fraction of total number of 
blocks: to see why, let 7 be the proportion of blocks which have error rate at most {p + e). The 
total fraction of errors in x is then at least (1 — 7)(p + e). Since this fraction is at most p by 
assumption, we must have 1 — 7 ^ + e). So 7 ^ e/(p + e) > e. 

Next, we show that the number of control blocks that have error rate at most p + e cannot be too 
small. The error e is fixed before the encoding algorithm is run, and so the sampler seed st is chosen 
independently of the set B. Thus, the fraction of control blocks in B will be roughly e. Specifically, 
we can apply Proposition A. 4 with /x = e (since B occupies at least an e fraction of the set of blocks), 
9 = e/2 and a = e^N. We get that the error probability 7 is exp(— $7(6*^^)) = exp(— r2(e^A^/ log A^). 
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(Note that for constant e, the seed length a = e^N ^ log + £log(l/e) is large enough for the 
proposition to apply.) □ 



Proof of Lemma 6.3. Fix e and the sampled set T which is good for e. Consider a particular 
received block Xi that corresponds to control block j, that is, Xi = Cj + ej. The key observation 
is that the error vector Cj depends on e and the sampler seed T, but it is independent of the 
randomness used by SC to generate Cj. Given this observation, we can apply Proposition A.l 
directly: 

(a) If block i has error rate at most p + £, then SC-Decode decodes correctly with probability 
at least 1 — ci/N'^ ^ 1 — 1/-/V over the coins of SC. 

(b) If block i has error rate more than p + £, then SC-Decode outputs _L with probability at 
least 1 — ci/N"^ ^ 1 — 1/-/V over the coins of SC. 

Note that in both statements (a) and (b), the probability need only be taken over the coins of 

SC. 

Consider Y, the the number of control blocks that either (i) have "low" error rate p + e) yet 
are not correctly decoded, or (ii) have high error rate, and are not decoded as _L. Because statements 
(a) and (b) above depend only on the coins of SC, and these coins are chosen independently in 
each block, the variable Y is statistically dominated by a sum of independent Bernoulli variables 
with probability of being 1. Thus E\Y] ^ l/N < 1. By a standard additive Chernoff bound, 
the probability that Y exceeds is at most exp(— $7(e^£)). The bound on Y implies both the 

bounds in the lemma. □ 

Proof of Lemma 6.4. Consider a block Xj that corresponds to payload block j, that is, Xj = Bj+Ci. 
Fix e, St, and s^. The offset A is independent of these, and so we may write Xi = m + A,, where 
Ui is fixed independently of Aj. Since A is a t'-wise independent string with t' = il.{e'^ N / log N) 
much greater than the size Alog of each block, the string Aj is uniformly random in {0, 
Hence, so is Xi. By Proposition A.l we know that on input a random string, SC-Decode outputs 
_L with probability at least 1 - ci/iV^ ^ 1 - 1/iV 

Moreover, the i'-wise independence of the bits of A implies ^^^^ ^y -wise independence of the 
blocks of A. Define t[i^^^^ = mm{^^j^,§}. Note that f^(j^) ^ i'woefc. ^ i- The decisions 
made by SC-Decode on payload blocks are t'l^iocks'^'^^^ independent. Let Z denote the number of 
payload blocks that are incorrectly accepted as control blocks by SC-Decode. We have E[Z] ^ 

^ e£/48 (for large enough N). 

We can apply a concentration bound of Bellare and Rompel [1, Lemma 2.3] using t = tfj^ocfcs' 
fj, = E[Z] ^ ||, ^ = ||, to obtain the bound 

Pr[Z ^ ^] ^ 8 ( l^ + yhiocks) \ ^ nogN)-^^^biocks) ^ e-f^(e^JViogiogJV/iog2jv) _ 

This bound implies the lemma statement. □ 

Proof of Lemma 6.5. Suppose the events of Lemmas 6.3 and 6.4 occur, that is, for at least ei/A 
of the control blocks the recovered value Fi is correct, at most el/2A of the control blocks are 
erroneously decoded, and at most ei/2A of the payload blocks are mistaken for control blocks. 
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Because the blocks of the control information come with the (possibly incorrect) evaluation 
points di, we are effectively given a codeword in the Reed-Solomon code defined for the related 
point set {di}. Now, the degree of the polynomial used for the original RS encoding is d* = 
\oj\/log{N) - 1 < Se'^N/logN = e£/8. Of the pairs {di,di) decoded by SC-Decode, we know at 
least ^ are correct (these pairs will be distinct), and at most 2- |j are incorrect (some of these pairs 
may occur more than once, or even collide with one of the correct). If we eliminate any duplicate 
pairs and then run the decoding algorithm from Proposition A. 2, the control information uj will be 
correctly recovered as long as the number of correct symbols exceeds the number of wrong symbols 
by at least d* + 1. This requirement is met if^ — 2x|j^ci* + l. This is indeed the case since 
d* < ei/8. 

Taking a union bound over the events of Lemmas 6.3 and 6.4, we get that the probability that 
the control information is correctly decoded is at least 1 — exp( -n{e^N/log^ N)), as desired. □ 



6.4 Completing the Proof of Main Theorem 6.1 

Proof of Theorem 6. 1 . We will first prove that the decoding of the payload codeword succeeds 
assuming the correct control information oj = (stt, sa, st) is handed directly to the decoder, i.e., in 
the "shared randomness" setting. We will then account for the fact that we must condition on the 
correct recovery of the control information uj by the first stage of the decoder. 

Fix a message m, error vector e, and sampler seed st, and let eg be the restriction of e to the 
payload codeword, i.e., blocks not in T. The relative weight of cq is at most ^ = +tAjogN _ 
p{l + 24eA^) ^ p{l + 25Ae) (for sufficiently smah e). 

Now since s,r is selected independently from T, the permutation tt is independent of the payload 
error eg. Consider the string P that is input the the REC decoder. We can write P = 'k{Q © A) = 
vr((5 © eg ® A). Because a permutation of the bit positions is a linear permutation of i S^t 
P = 7r(Q + A) © 7r(eQ) = P © 7r(eQ). 

Thus the input to REC is corrupted by a fraction of at most p{\ + 25Ae) errors which are t-wise 
independent, in the sense of Proposition A. 7 [ ]. Thus, with probability at least 1 — e~^^^ = 
1 — e"^*-^ N/\ogN) ^ ^j^g message m is correctly recovered by Decode. 

In the actual decoding, the control information is not handed directly to the decoder. Let u 
be the candidate control information recovered by the decoder (in Step 8 of the algorithm). The 
above suite of lemmas (Lemmas 6.2, 6.3, 6.4, and 6.5) show that the control information is correctly 
recovered, i.e., Cj = ijJ, with probability at least exp(— r2(e^A^/ log^ A^))). 

The overall probability of success is given by 

Pr [payload decoding succeeds with control information w] 

which is at least 

Pr[a; = a; A payload decoding succeeds with control information Co] 

UJ 

= Pr[a; = a; A payload decoding succeeds with control information uj] 

UJ 

1 — Pr[Cj ^ u] — Pr [payload decoding succeeds given uj] 

UJ U) 

^ l-ex.-p{-n{e^N/log^ N)) -exp{-Q{e^N/logN)) . 
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Because e is a constant relative to log N, it is the former probability that dominates. This completes 
the analysis of the decoder and the proof of Theorem 6.1. □ 



7 Capacity-achieving codes for online space-bounded channels 

In this section, we outline a Monte Carlo algorithm that, for any desired error fraction p G (0, 1/2), 
produces a code of rate close to 1 — H{p) which can be efficiently list decoded from errors caused 
by an arbitrary randomized online log-space channel that corrupts at most a fraction p of symbols 
with high probability. Recall that for p > 1/4, resorting to list decoding is necessary even for very 
simple (constant space) channels. If the channel is allowed a space bound S = o{N / log N), our 
construction and decoding times are polynomial in the block length N of the code and 2*^. 

7.1 Channel models and branching programs 

We model space-bounded online channels as a restricted form of bounded-width branching pro- 
grams. Galil et al. [13] formulated a uniform version of this model, with finite automata replacing 
branching programs. We use a nonuniform model since it simplifies several proofs and captures a 
broader class of channels, including AVC's. 

Definition 6 (Branching programs). A layered, oblivious branching program of width 2^ and 
length £ is a sequence of £ transition functions Fi : Q x {0, 1} — )• Q, where Q = {0, 1}'^ is a 
set of 2^ states, together with a start state qq G Q, an input map I : [i] ^ [N], and an output 
map Out : Q — t- {0,1}. On input a binary string {xiX2 ■ ■ ■ xn) G {0,1}^, the program computes 
Qi = Fi{qi-i,xi(^i)) for i = 1 to i. The (binary) output of the program is Out{q£). 

We define an online branching program of width 2^ as a special case of the above, in which 
i = N and I is the identity function, so that qi = Fi{qi^i,Xi). Such a program is essentially a 
nonuniform finite automaton, for which the transition function can change from symbol to symbol. 
A randomized online branching program is a probability distribution over (deterministic) online 
branching programs (of the same input length and width bound). 

To model space-bounded channels (as opposed to Boolean functions), we augment the standard 
branching program model with the ability to output bits at each phase. 

Definition 7 (Space bounded channels (Def. 1 restated)). An online space- channel is a read- 
once width-2^ branching program that outputs one bit at each computation step. Specifically, let 
Q = {0, l}*^ be a set of 2'^ states. For input length N , the channel is given by a sequence of N 
transition functions Fi : Q x {0, 1} — t- Q x {0, 1}, for i = 1 to N , along with a start state go S Q. 
On input x = {xiX2 ■ ■ -xn) G {0, 1}^, the channel computes {qi,yi) = Fi{qi-i,Xi) for i = 1 to N. 
The output of the channel, which is denoted A{x), is y = (2/12/2 • • • Vn) G {0, 1}^. 

A randomized (online space-S" channel is a probability distribution over the space of deterministic 
(online space-S) channels. For a given input x, such a channel induces a corresponding distribution 
on outputs. A randomized channel A is pA^-bounded with probability 1 — /3 if, for all inputs 
X G {0, 1}^, with probability at least 1 — fi, the channel flips fewer than pN bits of x, that is 

Pr [weight(x A{x)) > pN] < j3 . 
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Remark 4 (Bounded Lookahead). We may also consider a model in which the channel bases its 
decision about which bits to flip not only on positions seen so far, but some number of positions 
in the future. A channel with look-ahead t is specified by A'^ + 1 transition functions. The input is 
augmented with t extra dummy bits, i.e. x' = x||0*, and the output of the channel is the last N 
bits produced by the transition functions, that is y = yt+iyt+2 • • • Vt+N- The results of this section 
extend directly to channels with 0{S) look-ahead. 

7.2 Nisan's pseudorandom generator 

The encoding function of our code will make use of Nisan's pseudorandom generator for fooling 
space bounded algorithms, or in the non-uniform setting, bounded width branching programs. We 
first define the notion of indistinguishability of two distributions relative to a function. 

Definition 8 (Indistinguishability). For a given (possibly randomized) Boolean function A on some 
domain D and two random variables X, Y taking values in D, we write X ~_4 Y if 

I Fi{A{X) = 1) - Pr(^(y) = l)\<^r] . 

Theorem 7.1 (Nisan's PRG [29]). For integers S',m, there exists s ^ 0(S"logm) and a function 
Nis : {0, l°S'") —5. {0, l}'" such that for every randomized online branching program B on m 

inputs of width 2^ , 

—s' 

where Ut denotes the uniform distribution on {0,1}*. For such a generator, s is called its seed 
length, and its error. 

7.3 Code construction and ingredients 

We will use the same high level approach from our construction for the additive errors case, with 
some components changed, and with a seed in the control information to obtain an offset that looks 
uniformly random to online log-space bounded channels. 

Parameters. Input parameters of the construction: N, p, S, e, where 

(i) N is the block length of the final code, 

(ii) pN is the bound on the number of errors introduced (0 < p < 1/2) by the channel w.h.p. 

(iii) is a bound on the space of the online channel (as per Definition 7), such that S is both 
n{logN) and o(A^/logiV). 

(iv) e is a measure of how far the rate is from the optimal bound of 1 — H{p) (that is, the rate 
must be at least 1 — H{p) — e). We will assume 0<2e<l/2 — p. 

The seeds/control information. The control information u will consist of three randomly 
chosen strings Stt, st, sr where St^, st are as in the additive errors case. We will take the lengths of 
Stt, St be CN where C = C(j') s) '^iH be chosen small enough compared to e (the exact choice of C is 
not too important, but we remark that choosing, say, C = e^^ should suffice). 
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The third string sr G {0, 1}^^ wih be a seed for Nisan's generator Nis for randomized onhne 
branching programs of width 2^'^ with error r/Nis ^ 2"^"^. Since S ^ o{N/ log N), a seed length of 
(^N is sufficient for Nisan's generator from Theorem 7.1. The offset T Nis(sr) wih be used to 
fool the space-bounded channel. We won't need to add the t'-wise independent offset A as we did 
in the additive errors case. 

Encoding the message. The payload codeword encoding a message m will be 7r~^(REC(m)) ©F, 
which is the same as the encoding for the additive channel, with the offset T added to break 
dependencies in the log-space bounded channel instead of the t'-wise independent offset A. 

We will use the fact that REC is a concatenated code with the following standard components: 

(i) an outer code REC°"* (over an alphabet of size 2" for some constant a) of rate at least 1 — e/10 
that can correct a constant fraction k = K{e) of worst-case errors. As described in [ ], a 
variant of the expander codes construction of Spielman [36] gives such a construction (in fact 
with linear complexity encoding and decoding) with a = 0(log(l/e)) and k = VL{e^). Since 
we can always combine enough symbols to increase the alphabet size without compromising 
the fraction of correctable symbol errors, we will assume that a = 0(l/e'^). 

(ii) an inner code REC™ of dimension a, block length 6data ^ o/(l — H{p) — e/10) that is capacity- 
achieving for the binary symmetric channel with crossover probability p with decoding error 
probability at most k/10. Since a = 0(l/e^) 3> f)(log(l/K)/e^), such capacity-achieving codes 
exist by Shannon's theorem, and can be found in time exp(poly(l/e)) (which is independent 
of n) by a brute-force search. 

Thus the payload codeword will naturally be broken into njata "inner chunks" of size 6data • This 
structure of REC will be important to argue that once the control information is correctly recovered, 
the decoder can find the actual message. This step is not as easy as in the additive errors case, 
since the error distribution is no longer a simple i-wise independent distribution but rather caused 
by an arbitrary space-bounded online channel. 

For convenience, we will assume that 6data will divide the size of the control blocks &ctri (which 
will be G(log A^)). Therefore, each block of the final codeword will either be a control block, or a 
payload block that consists of fectri/^data consecutive "inner chunks" of the concatenated payload 
codeword. 

Encoding the seeds. The control information (consisting of the seeds Stt, sy, sr) will be encoded 
by a similar structure to the solution for the additive channel: a Reed-Solomon code of rate R^^ = 
R^^{p,£) concatenated with an inner stochastic code. But the stochastic code SC (of Proposition 
A.l) will now be replaced by a low-space pseudorandom code LSC with good list decoding properties 
and such that the stochastic encoding of every message according to the code is indistinguishable 
from a random string by a randomized online space-S" channel. 

The formal properties needed from LSC are stated in Proposition 7.2 below — we will apply 
the Proposition with the choice 5o = 25' and list decoding radius 6 = p + e < 1/2 — e, obtaining a 
code of block length 2Ao5'. The size 6ctri of the control blocks will be set to be equal to the block 
length 2A01S' of LSC. Let nctri denote number of control blocks, which also equals the block length 
of the Reed-Solomon code. Note that 

nctri ~ eAT/ftctri = e{eN/S) . 
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The encoding of control blocks is exactly as in the additive errors case, with LSC replacing SC. 
As in the additive errors case, the control blocks will be interspersed with the payload blocks at 
locations specified by the sampler's output on st- 

Rate of the code. The code encoding the control information is of some small constant rate 
Rctr\{Pi^) stnd the control information consists only of 0{(N) bits. Given e, we can select ( small 
enough so that the control portion of the codeword only adds eN/2 bits to the overall encoding. 
The rate of REC is at least (1 - e/10)(l - H{p) - s/W) ^ 1 - H{p) - e/5. So the rate of the overaU 
stochastic code is at least 1 — H{p) — e as desired. 

We next formally state and prove the properties of the code LSC needed above, thus completing 
the description of the code and encoding function. 

7.4 Low-space pseudorandom stochastic code 

Definition 9 (Decomposable stochastic code). A binary stochastic code with encoding map E where 
E : {0, 1}^ X {0, 1}* — > {0, 1}'' is said to be decomposable if there exist functions Ei : {0, 1}*^ — > 
{0, 1}^ and E2 : {0, 1}^ ^ {0, 1}^ such that E{x,y) = Ei{x) © ^2(2/) for every x,y. We say that 
such a encoding decomposes as E = [Ei,E2]. □ 

Definition 10 (List-decodablc low-space pseudorandom code). A decomposable binary stochastic 
code with encoding map E : {0, 1}*^ x {0, 1}* {0, 1}'' that decomposes as E = [Ei,E2] is said to 
be a {S, L)-list decodable (5, 7)-pseudorandom code if the following properties hold: 

1. E is {6, L)-list decodable, i.e., for every y G {0, 1}^, there are at most L pairs (m, r) such that 
E{m,r) is within Hamming distance 6b ofy. 

2. For every m G {0, 1}^ and every randomized online branching program B (that can depend on 
m,Ei{m)) with b input bits and width 2^ , we have 

E{m, Us) Ub . 

The rate of such a code equals k/b, and its seed complexity is s. □ 

The construction of the necessary stochastic code, whose codewords look random to low-space 
channels, is guaranteed by the following lemma. 

Proposition 7.2 (Inner control codes). For some fixed positive integer Aq the following holds. For 
all 6, < 6 < 1/2, there exist R = R{6) ^ (1/2 - 6)^^^^ > and a positive integer L = L{5) ^ 
1/(1/2 — 5)*^(-^) such that for all large enough integers Sq, there exists a 2'^^^°^ time randomized 
Monte Carlo construction of a decomposable stochastic code with encoding E : {0, l}'^ x {0, 1}* — >■ 
{0, 1}" with u = AqS'o, k ^ Ru and s = IOiSq, that is {S, L) -list- decodable {So,2~^°)-pseudorandom 
with probability at least 1 — 2^". 

Further, there exists a deterministic decoding procedure running in time 2'^('^o) that, given a 
string y G {0, 1}**, recovers the complete list of (at most L) pairs {m,r) whose encodings E{m,r) 
are within Hamming distance at most Su from y. 
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Proof. The existence will be shown by a probabilistic construction with a decomposable encoding 
E{m,r) = C{m) ® BPPRG(r) where C will be (the encoding map of) a linear list-decodable code, 
and BPPRG will be a generator that fools width 2^° branching programs, obtained by picking 
BPPRG(r) G {0, 1}" independently and uniformly at random for each seed r. Here u = AqSq 
for a large enough absolute constant Aq as in the statement of the Proposition. Note that the 
construction time is 2'^^") = 2'^('^"). 

List-decoding property. We adapt the proof that a truly random set is hst-decodable. Let 
C ^ {0,1}" be a linear {S,Lc = Lc{6))-list decodable code; such codes exist for rates less than 
1 — H{6) [ ], and can be constructed explicitly with positive rate R{d) ^ (1/2 — 5)^^^^ > for any 
constant S < 1/2 with a list size Lq ^ 1/(1/2 — 6)^^^^ [ ' ]. We will show that the composed code 
E has constant list-size with high probability over the choice of BPPRG as long as the rate of the 
combined code is strictly less than 1 — H(6). 

Fix a ball B' of radius 5u in {0, 1}", and let X denote the size of the intersection of the 
image of E with B' . We can view the image of £" as a union of 2^ sub-codes Cr, where Cr is 
the translated code C© BPPRG(r) (for r G {0,1}'^). Each sub-code Cr is (5, Lc)-list-decodable 
since it is a translation of C. We can then write X = Xlrejo i}= where Xr is the size of 
CrDB' . The Xr are independent integer-valued random variables with range [0, Lq] and expectation 
E[Xr] = \C\ ■ |S'|/2" ^ 2-"(i--f^('5)--f^c') where Rc denotes the rate of C. Therefore, 

E[X] = 2^^^o2-H'i--H(S)-Rc) ^ 2-<'^-^^^^-^c-W/Ao} ^ 

Suppose Rc + 10/Ao = 1 - H{6) - ao, so that E[X] = 2^"°". Let t be the ratio L/E[X], where 
L = L{6) is the desired list-decoding bound for the composed code E. We will set L = 3Lc/ao. By 
the multiplicative Chernoff bound for bounded random variables, the probability (over the choice 
of BPPRG) that X > L is at most (i^^^I^l/^-c _ Simplifying, we get Pr[X > L] ^ (^)22-3« ^ 2-2". 

Taking a union bound over all 2" possible balls B' , we get that with probability at least 1 — 2-^", 
the random choice of BPPRG satisfies the property that the decomposable stochastic code with 
encoding map E = C ® BPPRG is {5, L)-list-decodable. 

PSEUDORANDOMNESS. We now establish the pseudorandomness claim. It suffices to prove 
the pseudorandomness property against all deterministic online branching programs of width 2-^0, 
since a randomized online branching program is just a distribution over deterministic branching 
programs. 

Fix an arbitrary codeword C{m). Consider the (multi)set Xm = {C(m)©BPPRG(r)} as r varies 
over {0, 1}*. Each element of this set is chosen uniformly and independently at random from {0, 1}". 
Fix an online width-2'^o branching program B. By a standard Chernoff bound, the probability, 
over the choice of Xm, that FiCxi^Xmi^i^) = 1] deviates from the probability Pr[B{Uu) = 1] that B 
accepts a uniformly random string by more than C, in absolute value, is at most exp{—Q((^'^\Xm\))- 
For C = 2"^o and \Xm\ = 2^ ^ 2^°^", this probability is at most exp(-J7(2^^o)). 

The number of online branching programs of width 2^0 on u input bits is at most exp(0(S'oM)2'^'') ^ 
exp(0(22'5o)). By a union bound over all these branching programs, we conclude that except with 
probability at most exp(— 2^*^'^'')) over the choice of BPPRG, the following holds for every online 
width 2^0 branching program B with u inputs bits: 

I Pr [B{x) = 1] - Fr[B{Uu) = 1]| ^ 2-^" . 
xeXm 
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Since m was arbitrary, we have proved that the constructed stochastic code is {Sq, 2 '^'')-pseudorandom 
with probabihty at least 1 — 2~^". 

Decoding. Finally, it remains to justify the claim about the decoding procedure. Given a 
string y G {0, 1}", the decoding algorithm will go over all {ra,r) G {0, 1}'^ x {0, 1}^ by brute force, 
and check for each whether dist(i?(m, r), y) ^ 5u. By the list-decoding property, there will be at 
most L such pairs (m,r). The decoding complexity is 2^^^^^^ = 2'^'-"\ □ 

7.5 List decoding algorithm 

The decoding algorithm will be similar to the additive case with the principal difference being that 
the inner stochastic codes will be decoded as per the procedure guaranteed in Proposition 7.2 (using 
time 2*^^*^^ for each block). For each block, we obtain a list of L possible pairs of the form (aj,aj). 
These set of (at most NL pairs) are the fed into the polynomial time Reed-Solomon list decoding 
algorithm (guaranteed by Proposition A. 3), which returns a list of poly(l/e) values for the control 
information. This comprises the first phase of the decoder. 

Once a list of control vectors is recovered, the second phase of the decoder will run the decoding 
algorithm for REC for each of these choices of the control information and recover a list of possible 
messages. 

The steps to decode each of inner stochastic codes takes time 2*^*^'^^ and decoding the Reed- 
Solomon code as well as REC takes time polynomial in N . So the overall run time is polynomial in 
N and 2^. 

The main theorem about decoding is the following. 

Theorem 7.3. Let \N s he an arbitrary randomized online space-S channel on N input bits that is 
pN-bounded with probability 1 — /?. Consider the code construction described in Section 7.3 using 
component codes REC, a Reed-Solomon code of small enough rate R^^, and the code LSC that is 
{p + e, L)-list decodahle and {2S,2~'^^)-pseudorandom (which happens with 1 — 2^^('^) probability). 

Then for every message m, with high probability over the choice of control information St^jSt, sr 
and the errors introduced by W5, the list output by the above-mentioned list decoding algorithm 
includes the message m with probability at least 

1-2/3- Ar2-f^(^'^) - 2-^('='^/'^) . 

The running time of the decoding algorithm is polynomial in N and 2'^ (and thus polynomially 
bounded in the block length if the space bound is logarithmic). 

Since the construction of LSC in Proposition 7.2 guarantees the required pseudorandomness 
property with probability 1 — 2~^^^\ setting S = Q{[ogN) in the above theorem implies our main 
result (Theorem 3.4) on capacity-achieving codes for list decoding on online log-space channels. 

The novelty compared to the additive errors case is in the analysis of the decoder, which is more 
subtle since we have to deal with a much more powerful channel. The remainder of this section 
deals with this analysis, which will establish the validity of Theorem 7.3. 

7.6 Analyzing Decoding: Main Steps 

Our analysis requires two main claims. 
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Lemma 7.4 (Few Control Candidates). The decoder recovers a list of L' ^ poly(l/e) candidate 
values of the control information. With high probability (specifically, with probability at least 1 — 
/3 — exjp{—Q,{e^ N / S)) — {N + 1)2^'^ assuming S > logN), the list includes the correct value uj = 
{s-n, sti sy) of the control information used at the encoder. 

Lemma 7.5 (Payload decoding succeeds). Given the correct control information (7r,r, F), the 
decoder succeeds with high probability. Specifically, the probability of successful decoding is at least 
l-/3-A^exp(-r?(e35));. 

Combining these two lemmas, which we prove in the next two sections, we get that except with 
probability at most 2/3 + exp(— ri(e^A^/5)) + A^exp(— ^(e^S')), the decoder recovers a list of at most 
L' ^ poly(l/e) potential messages, one of which is the correct original message. This establishes 
Theorem 7.3. 

7.7 Control Candidates Analysis 

In this section we show that the decoder can recover a small list of candidate control strings, one 
of which is correct with high probability (Lemma 7.4). 

Our analysis follows the case of additive errors, but the relaxed goal of list-decoding simplifies 
the analysis of this part considerably. Recall that the sampled set T is "good" for a particular error 
pattern (Definition 5) if at least a fraction e of the nctri control blocks have an error rate (fraction 
of flipped bits) bounded above hy p + e. 

There were four main lemmas in the analysis of additive errors. The first (Lemma 6.2) stated 
that the error pattern was good for T with high probability. A version of this lemma holds also for 
our space-bounded codes, although the analysis is significantly more subtle. 

Lemma 7.6 (Good Samplers: space-bounded analogue to Lemma 6.2). For every pN -bounded 
online space-S channel with S > logN and for every message m and permutation seed s-,^, with 
probability at least 1 — eyi^{—Q.{£^N/\ogN) — {N + 1)2~'^'^ over the choice of sampler seed st, the 
seed sr for the pseudorandom offset, the coins of the control encoding and the coins of the adversary, 
the set T is good for the error pattern e. 

We defer the proof of this lemma to Section 7.8.2, after we establish a key tool called the 
"Hiding Lemma" which is also used in the analysis of the payload decoding. For now, we turn 
to the second lemma in the analysis of the control information decoding (Lemma 6.3 for additive 
errors), which stated that when the sampled positions T are good for the error pattern e, one can 
correctly recover enough control blocks with relatively few errors. This is no longer true in our 
setting, but we require only the following weaker statement. 

Lemma 7.7 (Correct Control Blocks — list decoding version). For any e,T such that T is good 
for e, the decoding algorithm for the inner codes LSC outputs a list of L symbols containing the 
correct symbol ai for at least ^^y^ = 0(^^) control blocks. 

Proof. The list-decoding radius of the LSC code is set to be 5 > p + e, so all blocks with an error 
rate below p + e produce a valid list. □ 

The third lemma from the analysis of additive errors (Lemma 6.4), which previously stated that 
very few payload blocks are mistaken for control blocks, requires significant change. It is possible 
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for the space-bounded channel to inject fake control blocks into the codeword (by changing a block 
to some pre-determined codeword of LSC). Therefore we can only say that the total number of 
candidate control blocks is small. 

Lemma 7.8 (Bounding mistaken control blocks). For every m, e,uj, the total number of candidate 
control symbols is at most . 

Proof. Since each block has 6ctri = ^AqS bits, there are 2A k!g at blocks considered by the decoder. 
The list decoding of each such block yields at most L candidate control symbols. □ 



Putting the pieces together to prove Lemma 7.4 Given Lemmas 7.6, 7.7 and 7.8, we only 
need to ensure that the rate R^^ of the Reed-Solomon code used at the outer level to encode the 
control information is small enough so that list decoding is possible according to Proposition A. 3 
as long as (1) the number of data pairs n is at most ^^^g and (2) the number of agreements t 

is at least 0(^^). The claimed list decoding is possible with rate R^^ = 0{e^/L), and the list 
decoder will return at most 0{L/e'^) candidates for the control information. Since the list decoding 
radius 6 of LSC was chosen to be 5 = p + e < 1/2 — e, we have L ^ l/e'^^^^ by the guarantee of 
Proposition 7.2, so the output list size is bounded by a polynomial in 1/e. This proves Lemma 7.4; 
the claimed failure probability is obtained by adding the probability /3 that the channel flips more 
than pN bits, and the failure probability of the sampler from Lemma 7.6. 



7.8 Payload Decoding Analysis 

We now turn to the analysis of the decoding of the payload codeword and the task of proving 
Lemmas 7.6 ("good samplers") and 7.5 (correct final decoding). We first develop a key tool, the 
"Hiding Lemma." 



7.8.1 The Hiding Lemma 

Given a message m, and pseudorandom outputs vr, T, T based on the seeds Sj^, st, sr, let 

Enc(m;7r,r,r,ri, ...,r„^tJ 

denote the output of the encoding algorithm when the rj's are used as the random bits for the 
LSC encoding. Let Enc(m;7r,r, •) be a random encoding of the message m using a given 7r,T and 
selecting all other inputs at random. 

Definition 11 (Conditional Lidistinguishability). For random variables X,Y, Z withX,Y defined 
on {0,1}^, and T] 0, we say that X and Y are online space-S" indistinguishable given Z with 
advantage r] if for all values z of Z, and for all randomized online branching programs Az ( that 

could depend nonuniformly on z) with N inputs and width 2^ , we have X Y , where X and Y 
are conditioned on Z = z. □ 

The following crucial lemma lets us limit the damage that an online space-bounded channel can 
cause to our codewords. 
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Lemma 7.9 (Hiding Lemma). For all messages m, sampler sets T and permutations tt, the random 
variables Enc(m; tt, T, •) and Un (the uniform distribution on {0,1}''^^ are online space-2S indis- 
tinguishable given (m, vr, T) with advantage r], where rj is at most Ar2-2S + r]^.^ ^ (jv + i) . 2-2^. 

We defer the proof of the above lemma to the end of this section. First, we develop a useful 
corollary on error distributions. 

Definition 12 (Error distributions). Given a randomized channel A on N bits and a random 
variable D on {0, 1}^, let £ji^{D) denote the error distribution of A on D, that is D @ A{D). 

An important consequence of the Hiding Lemma is that even with the knowledge of vr and T, 
the distribution of errors inflicted by a space-bounded channel on a codeword of our code and on 
a uniformly random string are indistinguishable by space-bounded tests. 

Corollary 7.10 (Errors are Near-Oblivious). Let W5 be a randomized online space-S channel. 
For every m,Tr,T, the error distributions £^^{11^) and <£'ws(Enc(m; vr, T, •)) are online space-S 
indistinguishable given {m,TT,T), with advantage at most (iV + 1)2"^'^. 

Proof. One can compose a distinguisher for the two error random variables with the channel W5 
to get a distinguisher for the original distributions of the Hiding Lemma. This composition can be 
achieved while maintaining the online restriction and the space usage is the sum of the space of 
W5 and the distinguisher (that is, at most 25). □ 



Proof of the Hiding Lemma (Lemma 7.9) 

Proof. The proof proceeds by hybrid argument. Fix m,TT,T, and recall that |T| = nctri is the 
number of control blocks. Let Dq be the random variable Enc(m; vr, T, •), and D2 be the uniform 
distribution over {0, 1}^ . We will define an intermediate random variable Di, in which the control 
blocks of Dq are replaced by fresh uniformly random strings. We show that Dq and D2 are both 
indistinguishable from Di, and hence from each other. 

For notational convenience, suppose that vr is the identity permutation, and that the set T, 
which dictates the locations of the control blocks, occupies the last i = rictri locations so that the 
control information is sent at the end of the codeword (the proof works identically for any other 
fixed pair T, vr). Let the encoding LSC decompose as [C* , BPPRG]. We can then write Dq as 

Do = (payload F ||ci BPPRG(ri)|| • • • ||c£ BPPRG(r^)) , 

where rj is the randomness used by the stochastic encoder LSC and ci, q are codewords of C*. 
Similarly, we can write Di = (payload F • • • \\U^). 

If v is the concatenated string BPPRG(ri)|| • • • ||BPPRG(r£), then conditioned on m, vr,T and 
F, any branching program that distinguishes Di from Dq can be used to construct a branching 
program of the same complexity that distinguishes u from uniform, by hard-wiring in the string 
(payload 0F ||ci|| • • • ||q). By the (25, 2~2'5)_pgeudorandom property of LSC and a standard hybrid 
argument, no online space-25 branching program can distinguish v from uniform with advantage 
better than £• 2^25 ^ ^2-'^^ . Hence Dq and Di are indistinguishable with advantage greater than 

iV2-2'5'. 



28 



We now show that Di and D2 are indistinguishable with advantage greater than r/Njs. Note that 
in both distributions, the last i blocks are uniform and independent of the long payload block. A 
randomized online space-2S' branching program A that distinguishes Di from D2 with advantage 
?7 can be turned into a distinguisher ^Bpayioad, defined by ;Bpayioad(-z) = A.{z © payload||C/^|| • • • U^), 
that distinguishes F from uniform with advantage t]. We can represent ;Spayioad as a randomized 
online branching program with the same width as A, that is, at most 2S. By the property of the 
pseudorandom generator Nis, we conclude that ^'s advantage in distinguishing Di from D2 is at 
most T^Nis ^ 2"^'^. □ 

7.8.2 Proof that Errors are Well Distributed (Lemma 7.6) 

The fact that log-space errors are nearly oblivious (Corollary 7.10) allows us to apply much of the 
analysis from the additive channels setting directly. The main observation is that one can check in 
log-space whether a set T is good for a given error pattern. 

Proof of Lemma 7. 6. Recall that the sampled set T is "good" for a particular error pattern (Def- 
inition 5) if at least a fraction e of the nctri control blocks have an error rate (fraction of flipped 
bits) bounded above hy p + e. We want to show that good sampler sets arise with high probability. 
Let GOODxie) be the predicate which tests if T is good for an error vector e. For a given T, one 
can evaluate GOODt in space at most log (one needs two counters of size roughly log(6ctri) and 
log{nctri) < ^og{N/bctri), respectively: one keeps track of the number of errors in a given block and 
the other, of the number of blocks seen so far where the number of errors is above the threshold). 

Fix a sampler seed st- Now consider a piV-bounded online space-5 channel W5, for S > log(A"). 
By Corollary 7.10, the errors introduced by W5 are indistinguishable by the test GOODt from a 
random error pattern, iSv\/s(f^Af)i that does not depend on T. Thus, for every T, we have 

Pr (T good for errors introduced by W5) ^ Pr (T good for £\Ns{Un)) - {N + 1)2"^^ , 

m,T,iT fixed T fixed 

(1) 

where the probability space on the left-hand side consists of the coins of W5 and the offset seeds 
in the encoding, and the probability space on the right consists of the coins of and the uniform 
random string Un- 

We can now average over the choice of T in (1). On the left-hand side, we get the overall 
probability of a good error pattern (this is the quantity we are trying to bound) for a fixed message 
m and permutation vr. On the right-hand side, we get the probability that T is good for a pN- 
bounded error vector drawn independently of T. 

Now the original "good sampler lemma" (Lemma 6.2) states that for every error vector e 
of relative weight at most p, with probability at least 1 — exp(— J7(e^A^/ log A^) over the choice of 
sampler seed st, the set T is good for e. This directly extends to error vectors drawn independently 
from T. We obtain 

Pr (T good for errors introduced by Wg) ^ 1 - exp(-il(e^A^/ log A^) - {N + 1)2"^'^ . □ 

m,TT fixed 

7.8.3 Proof of Lemma 7.5 

Armed with the Hiding Lemma, we return to the task of proving Lemma 7.5 on the claim that the 
payload decoding succeeds. Recall that for this part, we can assume that the decoder is given the 
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correct control information oj = (7r,T, F). We are thus in the shared randomness setting. 

We will use the Hiding Lemma (actually, Corollary 7.10) to argue that the events that we 
needed to happen with high probability for successful decoding against additive errors (where the 
errors were oblivious to the codeword) will also happen with good probability against the online 
space-S" channel W^. However, the high probability guarantee we will be able to prove is weaker, 
being 1 - Ar-f^(i) when S = O(logiV). 

In the following, we fix the message m and the choice T of the control block locations. 

Definition 13 (Friendly errors). For an error vector e G {0, 1}^ with Hamming weight at most 
pN , define e to he friendly for (vr, T) if the permuted error vector vr(e|y) is an error pattern on 
which the decoder for the code REC succeeds. 

The decoder for REC is the standard "hard decoder" for concatenated codes: it decodes each 
inner block by brute force to the message in {0, 1}" whose encoding by REC™ is closest to it, and 
runs a unique decoder to correct up to a fraction k = K{e) of errors for the outer code REC°"*. By 
the linearity of REC, the success of the decoding depends only on the error pattern, and not on the 
message. In particular, e is friendly for (vr, T) if and only if decoding TT{e\^f) leads to the all zeroes 
message. 

The following key lemma says that the error caused by any randomized online space-5 bounded 
channel is likely to be friendly and thus lead to successful decoding of REC. This lemma immediately 
implies Lemma 7.5. 

Lemma 7.11 (Errors are likely to be friendly). Let \Ns he any randomized online space-S channel 
on N bits that causes at most pN errors with probahility 1 — f3. For all suhsets T of control block 
locations, with probability at least 

(taken over vr and the choice of e according to iSws(Enc(m; T, •)) ), e is friendly for (vr, T). 

The proof of the above lemma (which appears in Section 7.8.4 below) is one of key difficulties in 
the analysis compared to the additive errors case. Another principal difference was that we could 
not argue that the number of payload blocks mistaken for control blocks was small, but allowing 
for list decoding enabled getting around this difficulty relatively easily. 

7.8.4 Proof of Lemma 7.11 

Let us denote W = for ease of notation. Our plan to prove Lemma 7.11 is the following: By 
Corollary 7.10 of the Hiding Lemma, we know that iSvv(Enc(m; T, •)) is online space-S indistinguish- 
able (with advantage (iV -|- 1)2"^"^) from £\j\/{U]\[). The latter error distribution is oblivious to the 
actual codeword, and so by the analysis of the additive errors case (proof of Theorem 6.1), we know 
that an error vector distributed according to 8\j^{Un) is friendly for ('/r,T) with high probability 
(specifically at least 1 — (3 — exp(— ^^^(A^/ log A^)), where the (3 term accounts for the chance that 
W causes more than pN errors) . 

If (?w(Enc(m; T, •)) were statistically close to £\i\i{Un)-, we could conclude that errors distributed 
according to <?w(Enc(m; T, •)) are also friendly for (7r,T) and we would be done. However, these 
error distributions are only online space-S indistinguishable. So in order to apply this style of 
reasoning, we would need to argue that checking whether an input error vector e G {0, 1}^ is 
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friendly for (vr, T) can be done by an online space-S machine (that can depend non-uniformly on 
vr). 

This task amounts to checking that 7r{e^f) decodes to the ah zeroes message. Since the decoder 
for REC corrects a fraction n of worst-case errors for the outer code REC°"*, this condition is met if 
at most a k fraction of inner blocks (each with 6data bits, corresponding to a codeword of REC™) are 
decoded to a non-zero element of {0, 1}". This check could potentiaUy be made in low space if we 
had access to e permuted according to vr. Unfortunately, we are only guaranteed indistinguishability 
against tests with online access to e. We do not know of a method for checking friendliness of e 
for (tt, T) in online space-S. 

We therefore resort to a more complicated and indirect argument. We follow ideas from our 
earlier work [ : ] on the correction of t-wise independent errors. Specifically, we will show that 
any particular set of S'/^data blocks of the concatenated code behave as they would for binary 
symmetric errors. We can then use concentration bounds for t-wise independent random variables 
with t = (5/6data) to argue that decoding succeeds with high probability. 

Lemma 7.12. For every subset P of at most S/bdata positions of REC"*^*, the probability (taken 
over TT and the choice of e according to £y^{m;T, ■)) that the inner decodings for REC™ on ir^e^f) 
return a non-zero element of {0, 1}" for every position in P is at most (k/5)I^I + {N + 1)2^"^. 

Proof. Since |P|6data ^ 5* = o{N / log A^), by the Vl[N/ log A^)-wise independence of the permutation 
TT, for each fixed error vector e, and therefore also for e chosen according to £y^/(Un), the probability 
that all positions in P are decoded incorrectly is at most (k/IO)'^' + 2^l^l^<'='= ^ (k/5)I^I. The 
condition that all inner blocks of REC™ corresponding to positions in P are incorrectly decoded can 
be checked by an online branching program of width 2*^, by simply keeping the at most (5'/6data) ■ 
&data = S bits in the blocks corresponding to P in memory. Since £\N{m; T, •) and £\n{Un) are online 
space-S indistinguishable with advantage {N + 1)2"^'^, we get the conclusion of the lemma. □ 

To complete the argument and prove that at most a k fraction of positions of REC°"* are decoded 
incorrectly, we need the following probabilistic fact. We include a proof, which is based on ideas 
used to prove similar statements in [ '], for completeness. 

Claim 7.1. Let a G (0, 1/3) and Xi,X2, ■ ■ ■ , Xn be 0-1 valued random variables with Pr[Xj = 1] ^ 
a for i = 1,2, . . . ,n. Further assume that for every subset P C {1, 2, . . . , n} of size i, 

Pr[n^^ = l] ^ (2) 



for some ag ^ 0. Then 
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using Markov's inequality. By linearity of expectation and the hypothesis (2), we have ^[Si] ^ 
(")a£. Since 3an ^ n, ^ (3^^^)^ and the claim follows. □ 

Let us now combine Lemma 7.12 and Claim 7.1 applied with the choice i = S/b^ata, en = 
k/5 + {N + 1)2"^*^, a£ = + (A^ + 1)2"^'^, and n = n^ata which is the number of data blocks 

of the payload codeword (which is also the block length of REC°^*). 

As ridata = ^(N) and i ^ S ^ o{N / log N), we have ^2nT^-e ^ ^ ^^^'Se enough N. Also 
ai = (k/5)^ + (A^ + 1)2-2^ ^ 2iV(K/5)^ since ^ ^ e^S and k ^ e^(i). Since 3a = 3K/4 + Oiv(l) ^ k, 
the tail bound (3) implies that the the probability (taken over vr and the choice of e according to 
£\i\i{m\ T, •)) that more than a fraction k of the inner blocks corresponding to REC™ are incorrectly 
decoded is at most 

ai 2N{k/5Y 2N 2N 



{2aY ^ 2^{k/5 + {N + l)2-^SY ^ 2^ 2^A 

This finishes the proof of Lemma 7.11, which in turn implies Lemma 7.5 and completes the proof 
of our main Theorem 7.3 on space-bounded channels. 



8 Time-Bounded Channels 

The ideas behind our code construction for online space-bounded channels are quite general and 
can be extended to construct codes against more powerful channels provided we have the necessary 
explicit pseudorandom generators that can play the role of Nisan's generator for branching pro- 
grams. In this section, we focus on channels which can be described by polynomial sized circuits. 
Specifically, we say a channel has circuit size T on inputs of length A^ if the effect of the channel 
can be described by a randomized circuit of at most T gates. 

Construction. Suppose we desire a code of block length and rate 1 — H{p) — e that can be 
list-decoded errors caused by a channel of circuit size A^'^ that flips at most pN bits with high 
probability. We can use a similar construction scheme to the online space-bounded case, with the 
size of the control blocks 6ctri = c'logA^ for a suitable c' (chosen large enough compared to c), and 
with the components LSC and the generator Nis changed (as described below) to accommodate the 
more powerful channel. 

The inner code LSC used for the control information encoding will be replaced by a (5, L)-list- 
decodable whose codewords are indistinguishable from Ui,^^^^ with advantage A^~'^ by (randomized) 
circuits of size N^~^^^^\ A Monte Carlo construction similar to the one described in Proposition 
7.2 can construct such a code with probability 1 — A^^^(^) in poly(A^) time. 

The generator Nis will be replaced by an efficiently computable pseudorandom generator PolyPRG : 
{0, 1}^(^)^ — )• {0, 1}^ of constant stretch such that the output of PolyPRG fools all circuits C of 
size A^'=+^(i); formally 

PolyPRG(C/cjv) ^ cUn ■ 

Such a pseudorandom generator which is computable in poly(A^) time exists under computational 
assumptions. For instance, the existence of one-way functions suffices [40, 19], as does the worst- 
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case complexity assumption that E ^ SIZE(2^''") for some absolute constant eq > ["] where 
E = DTIME(2'^(")) and SIZE(2^"") denotes the class of languages that have size 0(2^"") circuits. 

Decoding algorithm and its analysis. The decoding algorithm is identical to the algorithm 
described in Section 7.5 for the case of online space-bounded channels. 

Turning to the analysis, the part about recovering the control information applies verbatim, 
and implies that the list decoding of the control information succeeds in finding the correct control 
information with high probability. An analog of the Hiding Lemma 7.9 (and its Corollary 7.10) 
where indistinguishability is with respect to size A^'^+^(^) circuits follows with an identical argument. 
The analog of Lemma 7.11, which was at the heart of the proof that the payload decoding also 
succeeds w.h.p., is in fact easier to prove for polynomial-sized circuits and implies that the error 
vector caused by the size A^'^ channel is friendly for vr with high probability. (The proof is easier 
since a circuit of some fixed polynomial size can perform the check that an error vector e is friendly 
for vr; recall that this was the difficulty in the online space-bounded case and we required a more 
complex argument). We can thus prove the following formal statement for coding against channels 
of polynomial size. 

Theorem 8.1. Assume either E ^ SIZE(2^''"') for some eq > or the existence of one-way func- 
tions. For all constants e > 0, p £ (0,1/2), and c > 1, and for infinitely many integers N, 
there exists a Monte Carlo construction (succeeding with probability 1 — N~^^^^) of a stochastic 
encoder /decoder pair (Enc, Dec) with the following properties: 

• Enc encodes a message of length RN ^ (1 — H{p) — e)N bits into N bits. 

• (Enc, Dec) runs in time 

• For every received word r G {0, 1}^, the decoder Dec(r) outputs a list of at most poly(l/e) 
candidate messages. 

• For all messages m G {0, 1}^^, and for all randomized channels W with circuit size N'^ (which 
could depend non-uniformly on m) that cause at most pN errors with probability 1 — A^^^(^), 
the list output by the decoder contains m with probability at least 1 — A^~^(^) (taken over the 
stochastic encoding and the channel noise). 
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A Ingredients for Code Construction for Additive Errors 

In this section, we will describe the various ingredients that we will need in our construction of 
capacity achieving AVC codes, expanding on the brief mention of these from Section 6.1. 

A.l Constant rate codes for average error 

By plugging in an appropriate explicit construction of list-decodable codes (with sub-optimal rate) 
into Theorem 4.2, we can also get the following explicit constructions of stochastic codes, albeit 
not at capacity. We will make use of these codes to encode blocks of logarithmic length control 
information in our final capacity-achieving explicit construction. The total number of bits in all 
these control blocks together will only be a small fraction of the total message length. So the 
stochastic codes encoding these blocks can have any constant rate, and this allows us to use any 
off-the-shelf explicit constant rate list-decodable code in Theorem 4.2 (in particular, we do not 
need a brute- force search for small list-decodable codes of logarithmic block length). We get the 
following claim by choosing d = 1 and picking C to be a binary linear (a, ci(a)/2)-list decodable 
code in Theorem 4.2. 

Proposition A.l. For every a, < a < 1/2, there exists cq = co(a) > and ci = ci(a) < oo 
such that for all large enough integers b, there is an explicit stochastic code SCk^a of rate 1/cq with 
encoding E : {0,1}^ x {0,1}^ — )• {0,1}^"^ that is efficiently strongly a- decodable with probability 
1 - cxT^. 

Moreover, for every message and every error pattern of more than a fraction a of errors, the 
decoder for SC^^a returns _L and reports a decoding failure with probability 1 — ci2^^. 

Further, there exists an absolute constant C3 = 03(0) such that on input a uniformly random 
string y from {0, 1}'^"'', the decoder for SC^^a returns _L with probability at least 1 — ci2~^ (over the 
choice ofy). 

Proof. The claim follows by choosing d = 1 and picking C to be a binary linear (a, ci(a)/2)-list 
decodable code in Theorem 4.2. The claim about decoding a uniformly random input follows since 
the number of strings y which differ from some valid output of the encoder E is at most a fraction 
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a of positions is at most 2'^'^2^^'^^'^°^. By standard entropy arguments, we have (1 — H{a))cQb + 
log(ci(a)/2) ^ 36 (since the code encodes 36 bits, the capacity is 1 — H{a), and at most log(ci(a)/2) 
additional bits of side information are necessary to disambiguate the true message from the list). 
We conclude that the probability that a random string gets accepted by the decoder is at most 

2-b . 2log(ci{a)/2) ^ ci2^^. □ 

A. 2 Reed-Solomon codes 

If F is a finite field with at least n elements, and S = (ai, a2, ■ • • , On) is a sequence of n distinct ele- 
ments from F, the Reed-Solomon encoding, RS^^s,n,k{nT-), or just RS(m) when the other parameters 
are implied, of a message m = (mo, mi, . . . , mfc_i) G F^ is given by 

RSF,5,n,fc(m) = (/(ai), /(as), • • • , /(an)) • (4) 

where f{X) = mo + rriiX + ... + rrik^iX^^^ . The following is a classic result on unique decoding 
Reed-Solomon codes [•>-], stated as a noisy polynomial reconstruction algorithm. 

Proposition A. 2 (Unique decoding of RS codes). There is an efficient algorithm with running 
time polynomial in n and log |F| that given n distinct pairs (a^, Oj) E F^, 1 ^ i ^ n, and an integer 
k < n, finds the unique polynomial f of degree at most k, if any, that satisfies f{oii) = ai for more 
than values of i. Note that this condition can also he expressed as \{i : f{ai) = ai} \ — \{i : 
/(ai) / ai)}\ > k. 

We also state a list-decoding generalization (the version due to Sudan [ ] suffices for our 
purposes), which will be used in our result for space-bounded channels. 

Proposition A. 3 (List decoding of RS codes [•■)T]). There is an efficient algorithm with running 
time polynomial in n and log|F| that given n distinct pairs (ai,aj) G F^, 1 ^ i ^ n, and integer 
k < n, finds the set C of all polynomials f of degree at most k, if any, that satisfy f{oti) = Oj for 
at least t values of i as long as t > V2kn. Moreover, there are at most \j2njk polynomials in the 
set C. 

A. 3 Pseudorandom constructs 
A. 3.1 Samplers 

Let [N] = {1,2,..., N}. If B C [N] {0, 1} has density (i.e., fiN elements), then standard tail 
bounds imply that for a random subset T C [A^] of size i, the density of -B n T is within ±9 of 
H with overwhelming probability (at least 1 — exp{—ce£)). But picking a random subset of size i 
requires w ^log(A^/^) random bits. The following shows that a similar effect can be achieved by a 
sampling procedure that uses fewer random bits. The idea is the well known one of using random 
walks of length £ in a low-degree expander on vertices. This could lead to repeated samples while 
we would like I distinct samples. This can be achieved by picking slightly more than i samples and 
discarding the repeated ones. The result below appears in this form as Lemma 8.2 in [ ]. 

Proposition A. 4. For every N^N,0<9<fi<l,j>0, and integer £ ^ Iq = ^{-^ log(l/7)), 
there exists an explicit efficiently computable function Samp : {0, 1}'^ — t- [NY where a ^ 0(logA^ + 
llog{l/6)) with the following property: 
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For every B C [N] of size at least fiN , with probability at least 1 — 7 over the choice of a random 
s G {0,1}'^, |Samp(s)nS| ^ - 6')|Samp(s)| . 

We will use the above samplers to pick the random positions in which the blocks holding encoded 
control information are interspersed with the data blocks. The sampling guarantee will ensure that 
a reasonable fraction of the control blocks have no more than a fraction p + e of errors when the 
total fraction of errors is at most p. 

A. 3. 2 Almost t-wise independent permutations 

Definition 14. A distribution D on Sn (the set of permutations o/{l, 2, ... ,n}) is said to almost t- 
wise independent if for every 1 ^ ii < i2 < • • • < it ^ n, the distribution of (7r(ii), 7r(i2), • • • -.T^iit)) 
for TT chosen according to D has statistical distance at most 2~* for the uniform distribution on 
t-tuples of t distinct elements from {1, 2, . . . , n}. □ 

A uniformly random permutation of {1,2, ...,n} takes logn! = 0(nlogn) bits to describe. 
The following result shows that almost t-wise independent permutations can have much shorter 
descriptions. 

Proposition A. 5 ([22]). For all integers 1 ^ t ^ n, there exists D = O(tlogn) and an explicit 
map KNR : {0,1}°^ — ?■ Sn, computable in time polynomial in n, such that the distribution KNR(s) 
for random s G {0, l}'^ is almost t-wise independent. 

A. 3. 3 t-wise independent bit strings 

We will also need small sample spaces of binary strings in {0, 1}" which look uniform for any t 
positions. 

Definition 15. A distribution D on {0,1}" is said to t-wise independent if for every 1 ^ ii < 
12 < ■ ■ ■ < it ^ n, the distribution of (xj^ , Xjj, • • • , XjJ for x = (xi, X2, . . . , Xn) chosen according to 
D equals the uniform distribution on {0, 1}*. □ 

Using evaluations of degree t polynomials over a field of characteristic 2, the following well known 
fact can be shown. We remark that the optimal seed length is about | log n and was achieved in 
[ ], but we can work with the weaker O(tlogn) seed length. 

Proposition A. 6. Let n be a positive integer, and let t ^ n. There exists a ^ O(tlogn) and 
an explicit map POLY^ : {0, 1}°^ — t- {0, 1}", computable in time polynomial in n, such that the 
distribution POLY((,s) for random s £ {0, 1}°^ is t-wise independent. 

A. 4 Capacity achieving codes for t-wise independent errors 

Forney [12] constructed binary linear concatenated codes that achieve the capacity of the binary 
symmetric channel BSCp. Smith [ ] showed that these codes also correct patterns of at most a 
fraction p of errors w.h.p. when the error locations are distributed in a t-wise independent manner 
for large enough t. The precise result is the following. 
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Proposition A. 7. For every < p < 1/2 and every e > 0, there is an explicit family of binary 
linear codes of rate R ^ 1 — H{p) — e such that a code REC : {0, 1}^"- — ). {0, 1}"" of block length n 
in the family provides the following guarantee. There is a polynomial time decoding algorithm Dec 
such that for every message m G {0, l}-'^", every error vector e S {0, 1}" of Hamming weight at 
most pn, and every almost t-wise independent distribution D of permutations of {1,2, . . . ,n}, we 
have 

Dec(REC(m) +7r(e)) = m 

with probability at least 1 — 2~^('^^*) over the choice of a permutation vr E/j T?, as long as a;(logn) < 
t < en/10. (Here 7r(e) denotes the permuted vector: 7r(e)i = 6,^(1) J 

We will use the above codes (which we denote REC, for "random-error code") to encode the 
actual data in our stochastic code construction. 

B Capacity-achieving codes for average error 

The average error criterion is an extensively studied topic in the literature on arbitrarily varying 
channels; see the survey [2(j] and the many references therein. Here we assume the message is 
unknown to the channel and the decoding error probability is taken over a uniformly random 
choice of the message and the noise of the channel. The following defines this notion for the special 
case of the additive errors. The idea is that we want every error vector to be bad for only a small 
fraction of messages. 

Definition 16 (Codes for average error). A code C with encoding function 8 : A4. ^ TP is said to 
be (efficiently) p-decodable with average error 6 if there is a (polynomial time computable) decoding 
function D : TP — )• U {_L} such that for every error vector e G S", the following holds for at 
least a fraction (1 — 5) of messages m & Ai: D{£{m) + e) = m. □ 

B.l Codes for average error from stochastic codes for additive errors 

A slightly more general notion of stochastic codes (Definition 3) implies codes for average error. 

Definition 17 (Strongly decodable stochastic codes). We say a stochastic code is strongly p- 
decodable with probability 1 — 6 if the decoding function correctly computes both the message m and 
randomness u used at the encoder, with probability at least 1 — 6. □ 

Using a strongly decodable stochastic code we can get a code for average error by simply using 
the last few bits of the message as the randomness of the stochastic encoder. If the number of 
random bits used by the stochastic code is small compared to the message length, the rates of the 
codes in the two models are almost the same. 

Observation B.l. A stochastic code SSC that is strongly p- decodable with probability 1 — 6 gives a 
code AVC of the same block length that is p-decodable with average error 5. If the ratio of number 
of random bits to message bits in SSC is X, the rate 0/ AVC is (1 + A) times the rate o/SSC. 
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B.2 Explicit capacity-achieving codes for average error 

We would now like to apply Observation B.l to the stochastic codes constructed in Section 6 and 
also construct explicit codes achieving capacity for the average error criterion. For this, we need 
to ensure that the decoder for the stochastic code can also recover all the random bits used at 
the encoding. We already showed (Lemma 6.5) that the random string oj comprising the control 
information is in fact correctly recovered w.h.p. However, there is no hope to recover all the random 
strings ri, r2, . . . , used by the various SC encodings. This is because some of these control blocks 
could incur much more than a fraction p + e of errors (or in fact be totally corrupted) . 

Our idea is to use the same random string r for each of the £ encodings SC(Ai,r) in Step 5. 
Since each run of SC-Decode is correct with probability at least 1 — ci/N"^, by a union bound 
over all n blocks, we can claim that all the following events occur with probability at least 1 — ci/N 
(over the choice of r) : 

Among the control blocks, all of the at least ei/2 control blocks with at most a fraction 
p + e of errors are decoded correctly, along with the random string r, by SC-Decode. 
Further, SC-Decode outputs _L on all the other control blocks. Thus the correct 
random string r gets at least ei/2 "votes." 

By Lemma 6.4, with probability at least 1 — exp(— r2(e^A^/ log^ A^))) (over the choice of oj), the 
number of payload blocks that get accepted as control blocks is at most sf/24. (Note that this 
lemma only used the i'-wise independence of the offset string A.) 

The above facts imply that the control information w is recovered correctly with probability at 
least 1 — 0(1/A^) over the choice of {oj,r) (this is the analog of Lemma 6.5). Also r is the unique 
string which will get at least ei/2 votes from the various runs of SC-Decode. Therefore it can be 
correctly identified (with probability at least 1 — 0{1/N) over the choice of (w,r)) after running 
SC-Decode on all the n blocks. We can thus conclude the following result on capacity-achieving 
codes for average error (Definition 16). 

Lemma B.2 (Polynomially small average error). For every p £ (0, 1/2), and every e > 0, there is 
an explicit family of binary codes of rate at least 1 — H{p) — e that are efficiently p-decodable with 
average error 0(1 /N) where N is the block length of the code. 

One can reduce the error probability in this theorem by using redundant, but t-wise independent, 
values Tj for the control block encodings. Specifically, let (ri, ...,r^) be a random codeword from a 
Reed-Solomon code of dimension ei/8 (the simpler construction above corresponds to a majority 
code). Then the rj values are, in particular, e^/8-wise independent. One can modify the proof 
of Lemma 6.3 (which states that sufficiently many control blocks are recovered) to rely on only 
this limited independence. Under the same conditions that the control information is correctly 
recovered, there is enough information to recover the entire vector ri, r^. We can thus prove the 
following: 

Theorem B.3 (Exponentially small average error). For every p G (0, 1/2), and every e > 0, there 
is an explicit family of binary codes of rate at least 1 — H{p) —e that are efficiently p-decodable with 
average error ex-p{—Qs{N/ log^ N)) where N is the block length of the code. 
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C Impossibility Results for Bit-Fixing Channels when p > ^ 

We show that even very simple channels prevent reliable communication if they can introduce 
a fraction errors strictly greater than 1/4. In particular, this result (a) separates the additive 
(i.e., oblivious) error model from bounded-space channels when p > 1/4, and (b) shows that some 
relaxation of correctness is necessary to handle space- and time-bounded channels when p > 1 /4. 

Theorem C.l (Impossibility for p > \, detailed version). For every pair of randomized encod- 
ing/decoding algorithms Enc, Dec that make n uses of the channel and use a message space whose 
size tends to infinity with n, if a uniformly random message is sent over the channel, then 

1. there is a distribution over memoryless channels that alters at most n/4 hits in expectation 
and causes a decoding error with probability at least ^ — o(l). 

2. for every < < |, there is an online space- [log (n)] channel W2 that alters at most n(^ + v) 
bits (with probability 1) and causes a decoding error with probability Vt{u). 

Our proof adapts the impossibility results of Ahlswede [ ] on arbitrarily- varying channels. We 
present a self-contained proof for completeness. Readers familiar with the AVCs literature will 
recognize the idea of symmetrizahility from [ ■ ] . 

The Swapping Channel. We begin by considering a simple swapping channel, whose behavior is 
specified by a state vector s = (si, s„) £ {0, 1}*^. On input a transmitted word c = (ci, c^) G 
{0, 1}", the channel outputs Cj in all positions where q = s^, and a random bit in all positions 
where Cj 7^ Sj. The bits selected randomly by the channel at different positions are independent. 

There are several equivalent characterizations that help to understand the channel's behavior. 
First, we may view the channel as outputting either q or Sj, independently for each position. 



This view of the channel makes it obvious that the output distribution is symmetric with respect 
to the inversion of c and s. That is. 



The key idea behind our lower bounds is that if s is itself a valid codeword, then the decoder 
cannot tell whether c was sent with state s, or s was sent with state c. If c and s code different 
messages, then the decoder will make a mistake with probability at least 1/2. 

Note that the expected number of errors introduced by the channel is half of the Hamming 
distance dist(c, s); specifically, the number of errors is distributed as Binomial(dist(c, s), |). As 
long as dist(c, s) is close to n/2, then the number of errors will be less than n(j + v) with high 




y^sic) and Wc(s)are identically distributed 



(5) 



probability. 
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Hard Channel Distributions. Given an stochastic encoder Enc(-;-), consider the following 
distribution on swapping channels: pick a random codeword in the image of Enc and use it as the 
state. 

Select m',r' uniformly at random 
W~(c) : I Compute s ^ Enc(m',r') 
_ Output W^(c) 

Lemma C.2. Under the conditions of Theorem C.l, for channel VJ'^'^^'^ : 

(a) The probability of a decoding error on a random message is ^ — o(l). 

(b) The expected number of bits altered by W^™"*") is at most n/4. 

Proof, (a) We are interested in bounding the probability of a decoding error: 
Pr(correct decoding) = Pr ^Dec(W"'"'"(Enc(m, r))) = 

Pr (Dec(WEnc(m',r')(Enc(m,r))) = m 



channel coins 



m,r,m' ,r' 
swapping coins 



Because of the symmetry of the swapping channel, the right hand side is equal to the probability 
that the decoder outputs m', rather than m. This is a decoding error as long as m! differs from m. 
We assumed that the size of the message space grows with n, so the probability that m = m' goes 
to with n. We use "right" and "wrong" and shorthand for the events that decoding is correct 
and incorrect, respectively. 



Pr(right) = Pr (decoder outputs m') ^ Pr(wrong V m = m') ^ Pr(wrong) + o(l) . 

m,m' 

Thus, the probability of correct decoding is at most ^ — o(l). This proves part (a) of the Lemma. 

It remains to show that the expected number of bit corruptions is at most n/4. This follows 
directly from the following fact, which is essentially the Plotkin bound from coding theory: 

Claim C.l (Plotkin). If {m,r) is independent of and identically distributed to (m',r'), then the 
expectation of the distance dist(Enc(m, r), Enc(m', r')) is at most n/2. 

Proof. By linearity of expectation, the expected Hamming distance is the sum, over positions i, of 
the probability that Enc(m, r) and Enc(m',r') disagree in the ith. positions. The probability that 
two i.i.d. bits disagree is at most ^, so the expected distance is at most |. □ 

Part (b) of the lemma follows since the expected number of errors introduced by the swapping 
channel is half of the Hamming distance between the transmitted word and the state vector. □ 

Bounding the Number of Errors. To prove part (2) of Theorem C.l, we will find a (nonuni- 
form) channel with a hard bound on the number of bits it alters. In logarithmic space, it is easy for 
the channel to count the number of bits it has flipped so far and stop altering bits when a threshold 
has been exceeded. The difficult part is to show that such a channel will still cause a significant 
probability of decryption error. 
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As before, the channel will select m', r' at random and run the swapping channel with state 
s = Enc(m',r'). In addition, however, it will stop altering bits once the threshold of + v) bits 
have been exceeded. 

Consider now the transmission of a random codeword c = Enc(?n, r). Let G be the event that 
dist(c, s) ^ + u). By a Markov bound, the probability of G is at most i^l^^, , and so the 
probability of G is 1 — Pr(G') ^ j^!^ ^ ^- Conditioned on G, the number of bits altered by Wg 
on input c is dominated by Binomial(n(2 + J^), 5)- The probability that the number of bits altered 
exceeds n(| + u) is therefore at most exp(— ri(z^^n)). 

On the other hand, conditioned on G there is a significant probability of a decoding error. To 
see why this is the case, first note that conditioned on G the error-bounded channel will simulate 
Ws(c) nearly perfectly. Moreover, the event G is symmetric in c and s, and so conditioning on G 
does not help to distinguish Wc(s) from Ws(c). By the same reasoning as in the previous proof, 

Pr(incorrect decoding|G) ^ 2 ~ ^^^^ ' 

Since G has probability at least u, the channel causes a decoding error with probability at least 
^ — 0(1), in expectation over the choice of s. Hence, there exists a specific string s* for which the 
channel causes a decoding error with probability | — o(l). This completes the proof of Theorem C.l. 
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